fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-25 19:30:11 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	f7dcc11603	i965/fs: Add a builder argument to offset() Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>	2015-06-30 16:13:48 -07:00
Francisco Jerez	74c2458ecf	i965/fs: Migrate opt_cse to the IR builder. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-06-09 15:18:32 +03:00
Francisco Jerez	e7069fbc70	i965/fs: Don't drop force_writemask_all and _sechalf when copying a CSE temporary. LOAD_PAYLOAD instructions need the same treatment as any other generator instructions, at least FB writes and typed surface messages will need a payload built with non-zero execution controls. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-06-09 15:18:31 +03:00
Francisco Jerez	8013b8147a	i965/fs: Take into account all instruction fields in CSE instructions_match(). Most of these fields affect the behaviour of the instruction so it could actually break the program if we CSE a pair of otherwise matching instructions with different values of these fields. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-06-09 15:18:31 +03:00
Jason Ekstrand	41868bb682	i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction The newly reworked instruction is far more straightforward than the original. Before, the LOAD_PAYLOAD instruction was lowered by a the complicated and broken-by-design pile of heuristics to try and guess force_writemask_all, exec_size, and a number of other factors on the sources. Instead, we use the header_size on the instruction to denote which sources are "header sources". Header sources are required to be a single physical hardware register that is copied verbatim. The registers that follow are considered the actual payload registers and have a width that correspond's to the LOAD_PAYLOAD's exec_size and are treated as being per-channel. This gives us a fairly straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and force_writemask_all modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. This patch commit itself is made up of a bunch of smaller changes squashed together. Individual change descriptions follow: i965/fs: Rework fs_visitor::LOAD_PAYLOAD We rework LOAD_PAYLOAD to verify that all of the sources that count as headers are, indeed, exactly one register and that all of the non-header sources match the destination width. We then take the exec_size for LOAD_PAYLOAD directly from the destination width. i965/fs: Make destinations of load_payload have the appropreate width i965/fs: Rework fs_visitor::lower_load_payload v2: Don't allow the saturate flag on LOAD_PAYLOAD instructions i965/fs_cse: Support the new-style LOAD_PAYLOAD i965/fs_inst::is_copy_payload: Support the new-style LOAD_PAYLOAD i965/fs: Simplify setup_color_payload Previously, setup_color_payload was a a big helper function that did a lot of gen-specific special casing for setting up the color sources of the LOAD_PAYLOAD instruction. Now that lower_load_payload is much more sane, most of that complexity isn't needed anymore. Instead, we can do a simple fixup pass for color clamps and then just stash sources directly in the LOAD_PAYLOAD. We can trust lower_load_payload to do the right thing with respect to COMPR4. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-06 10:29:30 -07:00
Jason Ekstrand	94ee908448	i965/fs: Make LOAD_PAYLOAD take a header size Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-06 10:29:30 -07:00
Jason Ekstrand	32af7d4188	i965/fs_inst: Add an is_copy_payload helper This commit adds a new is_copy_payload helper to fs_inst that takes the place of the similarly named functions in cse and register coalesce. The two is_copy_payload functions in CSE and register coalesce were subtly different and potentially subtly broken. The new version unifies the two and should be more correct. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-06 10:29:30 -07:00
Jason Ekstrand	76c1086f2d	i965: Change header_present to header_size in backend_instruction Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-06 10:29:30 -07:00
Jason Ekstrand	a9ccb14d14	i965/fs_cse: Factor out code to create copy instructions v2: Get rid of the block parameter and make src a const reference Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-06 10:29:29 -07:00
Francisco Jerez	3da9f708d4	i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode. v2: Save some CPU cycles by doing 'return progress' rather than 'depth++' in the discard jump special case. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-04 17:44:17 +03:00
Francisco Jerez	f2fad0dc80	i965: Perform basic optimizations on the BROADCAST opcode. v2: Style fixes. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-05-04 17:44:17 +03:00
Matt Turner	3ca17e75e4	i965/fs: Correct mistake in determining whether a MUL is negated. a * b is equivalent to -a * -b, and the previous code was failing at that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89961 Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-04-14 12:16:03 -07:00
Matt Turner	47c4b38540	i965/fs: Allow CSE to handle MULs with negated arguments. mul x, -y is equivalent to mul -x, y; and mul x, y is the negation of mul x, -y. With NIR: total instructions in shared programs: 6167779 -> 6161193 (-0.11%) instructions in affected programs: 983511 -> 976925 (-0.67%) helped: 4106 HURT: 16 GAINED: 18 LOST: 7 Without NIR: total instructions in shared programs: 6192323 -> 6185299 (-0.11%) instructions in affected programs: 987875 -> 980851 (-0.71%) helped: 4146 HURT: 16 GAINED: 16 LOST: 0	2015-03-31 14:14:36 -07:00
Kenneth Graunke	db095eb43b	i965: De-duplicate is_expression_commutative() functions. Create a backend_inst::is_commutative() method to replace two static functions that did the exact same thing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2015-03-15 03:14:53 -07:00
Francisco Jerez	447879eb88	i965: Factor out virtual GRF allocation to a separate object. Right now virtual GRF book-keeping and allocation is performed in each visitor class separately (among other hundred different things), leading to duplicated logic in each visitor and preventing layering as it forces any code that manipulates i965 IR and needs to allocate virtual registers to depend on the specific visitor that happens to be used to translate from GLSL IR. v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor). Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-02-10 16:05:47 +02:00
Matt Turner	f0aec4ee1e	i965: Don't consider null dst instructions as matching non-null dst. When performing common subexpression elimination on instructions with non-null destinations we emit a MOV to copy the result to a new register that must have no other uses. In the case of: cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f ... cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f we put the first instruction in the AEB and decided that we could reuse its result when we found the second. Unfortunately, that meant that we'd emit a MOV from the first's destination, which is null. Don't do anything if the entry's destination is null and the instruction's destination is non-null. Tested-by: Tapani Pälli <tapani.palli@intel.com>	2015-01-15 10:11:42 -08:00
Matt Turner	3d8188d4f8	i965: Consider SEL.{GE,L} to be commutative operations. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-08 15:38:16 -08:00
Matt Turner	a28ad9d4c0	i965/fs: Perform CSE on MOV ..., VF instructions. Safe from causing optimization loops, since we don't constant propagate VF arguments. (for this and the previous patch): total instructions in shared programs: 4289075 -> 4271932 (-0.40%) instructions in affected programs: 1616779 -> 1599636 (-1.06%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-05 16:43:31 -08:00
Matt Turner	bd50213929	i965: Combine offset/texture_offset fields. texture_offset was only used by some texturing operations, and offset was only used by spill/unspill and some URB operations. These fields are never used at the same time. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2014-11-21 10:26:38 -08:00
Matt Turner	b65bd9583b	i965/fs: Perform CSE on MAD instructions with final arguments switched. Multiplication is commutative. instructions in affected programs: 48314 -> 47954 (-0.75%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-10-29 21:35:46 -07:00
Kenneth Graunke	39a5a60b57	i965: Allow CSE on Gen4-5 unary math. Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+ math instruction: it's a single instruction (SEND) with a GRF source. The difference is that it also implicitly clobbers a message register. The only visible effect is that CSE will remove the MRF-clobbering from later math operations. This should be fine; compute_to_mrf and remove_redundant_mrf_writes don't look at the values populated by implied writes, so they can't rely on those values being present. Less interference may actually help those passes make more progress. Binary math is still problematic, since it involves a separate MOV instruction to load the second operand. We continue disallowing CSE for binary math operations. total instructions in shared programs: 3340303 -> 3340100 (-0.01%) instructions in affected programs: 26927 -> 26724 (-0.75%) Nothing hurt, gained, or lost. ~6% reduction on a few shaders. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-10-15 08:44:54 -07:00
Jason Ekstrand	7210583eb8	i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode This is actually the squash of a bunch of different changes. Individual commit titles follow: i965/fs: Always 2-align registers SIMD16 for gen <= 5 i965/fs: Use the register width when applying offsets This reworks both byte_offset() and offset() to be more intelligent. The byte_offset() function now supports offsets bigger than 32. The offset() function uses the byte_offset() function together with the register width and the type size to offset the register by the correct amount. i965/fs: Change regs_read to be in hardware registers i965/fs: Change regs_written to be actual hardware registers i965/fs: Properly handle register widths in LOAD_PAYLOAD The LOAD_PAYLOAD instruction is a bit special because it collects a bunch of registers (with possibly different widths) into a single payload block. Once the payload is constructed, it's treated as a single block of data and most of the information such as register widths doesn't matter anymore. In particular, the offset of any particular source register is the accumulation of the sizes of the previous source registers. i965/fs: Properly set writemasks in LOAD_PAYLOAD i965/fs: Handle register widths in demote_pull_constants i965/fs: Get rid of implicit register doubling in the allocator i965/fs: Reserve enough registers for PLN instructions i965/fs: Make sources and destinations interfere in 16-wide i965/fs: Properly handle register widths in CSE i965/fs: Properly handle register widths in register_coalesce i965/fs: Properly handle widths in copy propagation i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD i965/fs: Properly handle register widths and odd register sizes in spilling i965/fs: Don't waste a register on texture lookups for gen >= 7 Previously, we were waisting a register in SIMD16 mode because we could only allocate registers in pairs. Now that we can allocate and address odd-sized registers, let's get rid of this special-case. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 10:29:14 -07:00
Jason Ekstrand	f0d43c09b2	i965/fs: Use offset a lot more places We have this wonderful offset() function for advancing registers, but we're not using it. Using offset() allows us to do some sanity checking and avoid manually touching fs_reg::reg_offset. In a few commits, we will make offset do even more nifty things for us. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-09-30 10:29:13 -07:00
Matt Turner	49374fab5d	i965: Make instruction lists local to the bblocks. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2014-09-24 09:42:46 -07:00
Matt Turner	072ea414d0	i965: Remove cfg-invalidating parameter from invalidate_live_intervals. Everything has been converted to preserve the CFG. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2014-09-24 09:42:46 -07:00
Matt Turner	20a849b4aa	i965: Use basic-block aware insertion/removal functions. To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2014-08-22 10:23:34 -07:00
Matt Turner	596990d91e	i965: Add and use foreach_block macro. Use this as an opportunity to rename 'block_num' to 'num'. block->num is clear, and block->block_num has always been redundant.	2014-08-18 18:56:30 -07:00
Jason Ekstrand	f5cc3fdcf1	i965/cse: Don't eliminate instructions with side-effects This casues problems when converting atomics to use the GRF. Sometimes the atomic operation would get eaten by CSE when it shouldn't. v2: Roll the has_side_effects check into is_expression Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-08-11 11:40:32 -07:00
Chris Forbes	0f4c5a70c6	i965: Get rid of backend_instruction::sampler The generators no longer use this. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-08-09 13:12:35 +12:00
Matt Turner	680fe0acb3	i965: Add cfg to backend_visitor. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2014-07-21 10:35:34 -07:00
Matt Turner	1d97212007	i965/fs: Perform CSE on sends-from-GRF rather than textures. Should potentially allow a few more cases, while avoiding doing CSE on texture operations on Gen <= 6 with the MRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80211 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: lu hua <huax.lu@intel.com>	2014-07-15 10:12:29 -07:00
Matt Turner	1ca6b5d2e8	i965/fs: Invalidate live intervals in opt_cse, not _local. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-07-14 11:27:52 -07:00
Matt Turner	bdbaa9ab5b	i965/fs: Move aeb list into opt_cse_local. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-07-14 11:27:52 -07:00
Matt Turner	2e90d1fb62	i965/fs: Pass cfg to calculate_live_intervals(). We've often created the CFG immediately before, so use it when available. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-07-01 08:55:52 -07:00
Matt Turner	266109736a	i965: Use typed foreach_in_list_safe instead of foreach_list_safe. Acked-by: Ian Romanick <ian.d.romanick@intel.com>	2014-07-01 08:55:51 -07:00
Matt Turner	bc2fbbafd2	i965: Add and use foreach_inst_in_block macros. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-07-01 08:55:51 -07:00
Matt Turner	22cd917329	mesa: Add and use foreach_in_list_use_after. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-07-01 08:55:51 -07:00
Matt Turner	35bc02dee8	i965/fs: Perform CSE on texture operations. Helps Unigine Tropics and some (old) gstreamer shaders in shader-db. instructions in affected programs: 792 -> 744 (-6.06%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-06-17 09:40:31 -07:00
Matt Turner	31ae9c25ff	i965/fs: Perform CSE on load_payload instructions if it's not a copy. Since CSE creates instructions, if we let CSE generate things register coalescing can't remove, bad things will happen. Only let CSE combine non-copy load_payloads. E.g., allow CSE to handle this load_payload vgrf4+0, vgrf5, vgrf6 but not this load_payload vgrf4+0, vgrf5+0, vgrf5+1	2014-06-17 09:40:30 -07:00
Matt Turner	4b7bca8979	i965/fs: Emit load_payload instead of multiple MOVs for large VGRFs.	2014-06-17 09:40:07 -07:00
Matt Turner	68b7b03429	i965/fs: Only consider real sources when comparing instructions.	2014-06-17 09:38:06 -07:00
Matt Turner	f51a7e00da	i965/fs: Clean up tabs in brw_fs_cse.cpp. I'm adding vec4 CSE, and I want to diff the files.	2014-06-11 20:09:22 -07:00
Kenneth Graunke	3a439534de	i965/fs: Allow CSE on math opcodes on Gen6+. total instructions in shared programs: 2081469 -> 2081248 (-0.01%) instructions in affected programs: 22606 -> 22385 (-0.98%) No programs were hurt by this patch. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-06-10 16:38:25 -07:00
Matt Turner	b1dcdcde2e	i965/fs: Loop from 0 to inst->sources, not 0 to 3. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-06-01 13:29:24 -07:00
Matt Turner	63d57f3b08	i965/fs: Name temporary ralloc contexts something other than mem_ctx. Or else poor programmers might mistakenly use the temporary mem_ctx, instead of the fs_visitor's mem_ctx and wonder why their code is crashing. Also remove the parenting. These contexts are local to the optimization passes they're in and are freed at the end.	2014-04-05 09:44:54 -07:00
Matt Turner	d2fcdd0973	i965/cfg: Clean up cfg_t constructors. parent_mem_ctx was unused since `db47074a`, so remove the two wrappers around create() and make create() the constructor. Reviewed-by: Eric Anholt <eric@anholt.net>	2013-12-04 20:05:42 -08:00
Matt Turner	68349e5219	i965/fs: Don't perform CSE on inst HW_REG dests (unless it's null) Commit `b16b3c87` began performing CSE on CMP instructions with null destinations. I relaxed the restrictions a bit too much, thereby allowing CSE to be performed on instructions with, for instance, an explicit accumulator destination. This broke the arb_gpu_shader5/fs-imulExtended shader tests because they emit MUL instructions with the accumulator as the destination. CSE would instead cause the MUL to write to a GRF, which is lower precision than the accumulator. Reviewed-by: Eric Anholt <eric@anholt.net> Cc: 10.0 <mesa-stable@lists.freedesktop.org>	2013-11-09 09:10:24 -08:00
Matt Turner	b16b3c8703	i965/fs: Perform CSE on CMP(N) instructions. Optimizes cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g50<1>F g40<8,8,1>F g10<8,8,1>F cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g51<1>F g41<8,8,1>F g11<8,8,1>F cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g52<1>F g42<8,8,1>F g12<8,8,1>F cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g53<1>F g43<8,8,1>F g13<8,8,1>F into cmp.ge.f0(8) null g45<8,8,1>F 0F (+f0) sel(8) g50<1>F g40<8,8,1>F g10<8,8,1>F (+f0) sel(8) g51<1>F g41<8,8,1>F g11<8,8,1>F (+f0) sel(8) g52<1>F g42<8,8,1>F g12<8,8,1>F (+f0) sel(8) g53<1>F g43<8,8,1>F g13<8,8,1>F total instructions in shared programs: 1644938 -> 1638181 (-0.41%) instructions in affected programs: 574955 -> 568198 (-1.18%) Two more 16-wide programs (in L4D2). Some large (-9%) decreases in instruction count in some of Valve's Source Engine games. No regressions. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-10-30 19:49:27 -07:00
Matt Turner	219b43c612	i965/fs: Don't emit null MOVs in CSE. We'd like to CSE some instructions, like CMP, that often have null destinations. Instead of replacing them with MOVs to null, just don't emit the MOV. Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-10-30 19:49:27 -07:00
Matt Turner	e52959e961	i965/fs: Match commutative expressions with reversed arguments. total instructions in shared programs: 1645011 -> 1644938 (-0.00%) instructions in affected programs: 17543 -> 17470 (-0.42%) Reviewed-by: Eric Anholt <eric@anholt.net>	2013-10-25 10:34:02 -07:00

1 2

71 commits