Commit graph

103 commits

Author SHA1 Message Date
Ian Romanick
fef175de09 intel/brw: Enable constant propagation for a couple more logical sends
This prevents some regressions later in the MR. Once load_const
operations are marked as is_scalar, they will cesase to get the
automatic constant propagation that occurs in try_rebuild_source.

No shader-db or fossil-db changes on any Intel platform.

v2: Slightly relax source restrictions on
SHADER_OPCODE_UNALIGNED_OWORD_BLOCK_READ_LOGICAL. Add a comment
explaining the restriction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
572e00dd66 intel/brw: Copy prop from raw integer moves with mismatched types
The specific pattern from the unit test was observed in ray tracing
trampoline shaders.

v2: Refactor the is_raw_move tests out to a utility function. Suggested
by Ken.

v3: Fix a regression caused by being too picky about source
modifiers. This was introduced somewhere between when I did initial
shader-db runs an v2.

v4: Fix typo in comment. Noticed by Caio.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19734086 -> 19733997 (<.01%)
instructions in affected programs: 135388 -> 135299 (-0.07%)
helped: 76 / HURT: 2

total cycles in shared programs: 916290451 -> 916264968 (<.01%)
cycles in affected programs: 41046002 -> 41020519 (-0.06%)
helped: 32 / HURT: 29

fossil-db:

Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531355 -> 151513669 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17209372399 -> 17208178205 (-0.01%); split: -0.01%, +0.00%
Max live registers: 32016490 -> 32016493 (+0.00%)

Totals from 17361 (2.75% of 630198) affected shaders:
Instrs: 2642048 -> 2624362 (-0.67%); split: -0.67%, +0.00%
Cycle count: 79803066 -> 78608872 (-1.50%); split: -1.75%, +0.25%
Max live registers: 421668 -> 421671 (+0.00%)

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149995644 -> 149977326 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15567293770 -> 15566524840 (-0.00%); split: -0.02%, +0.01%
Spill count: 61241 -> 61238 (-0.00%)
Fill count: 107304 -> 107301 (-0.00%)
Max live registers: 31993109 -> 31993112 (+0.00%)

Totals from 17813 (2.83% of 629912) affected shaders:
Instrs: 3738236 -> 3719918 (-0.49%); split: -0.49%, +0.00%
Cycle count: 4251157049 -> 4250388119 (-0.02%); split: -0.06%, +0.04%
Spill count: 28268 -> 28265 (-0.01%)
Fill count: 50377 -> 50374 (-0.01%)
Max live registers: 470648 -> 470651 (+0.00%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Kenneth Graunke
da395e6985 intel/brw: Fix extract_imm for subregion reads of 64-bit immediates
We could be trying to extract a D/UD from a Q/UQ, for example.  We were
ignoring the top 32-bits, which is incorrect.

Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>
2024-08-28 12:33:26 -07:00
Kenneth Graunke
51c85e0363 intel/brw: Drop misguided sign extension attempts in extract_imm()
This function never expands a type - it only narrows it.  As such, we
don't need to ever sign extend to fill additional new bits.  I think
this code was left over from earlier versions of my optimization pass
that was buggy and trying to handle cases it should not have.

Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>
2024-08-28 12:33:26 -07:00
Caio Oliveira
8ba8e33c39 intel/brw: Simplify @file annotations
Doxygen documentation says

> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.

so we can omit the filename itself when using the annotation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>
2024-07-22 22:48:03 +00:00
Caio Oliveira
71ccf8e4cd intel/brw: Rename fs_reg_* helpers to brw_reg_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Caio Oliveira
3670c24740 intel/brw: Replace uses of fs_reg with brw_reg
And remove the fs_reg alias.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29791>
2024-07-03 02:53:19 +00:00
Kenneth Graunke
9e750f00c3 intel/brw: Make opt_copy_propagation_defs clean up its own trash
Copy propagation often eliminates all uses of an instruction.  If we
detect that we've done so, we can eliminate the instruction ourselves
rather than leaving it hanging until the next DCE pass.

This saves some CPU time as other passes don't see dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
580e1c592d intel/brw: Introduce a new SSA-based copy propagation pass
(Quite a few of the restrictions here are ported from the old pass.)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00
Kenneth Graunke
3da444b79e intel/brw: Refactor code to commute immediates into legal positions
This will let us reuse this in a new pass shortly.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:12 -07:00
Kenneth Graunke
d45da713e7 intel/brw: Refactor try_constant_propagate()
This will let us reuse the bulk of this code in a new copy propagation
pass without replicating it.  We retain a wrapper function for dealing
with ACP entries, which the new pass won't have.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:10 -07:00
Kenneth Graunke
85aa6f80af intel/brw: Drop BRW_OPCODE_IF from try_constant_propagate
This was for Sandybridge's IF with embedded comparison, which only
existed for a single generation of hardware.  Since the compiler fork,
we no longer support Sandybridge here.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:08 -07:00
Kenneth Graunke
7019bc4469 intel/brw: Drop compiler parameter from try_constant_propagate()
This is unused.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
2024-06-08 02:19:06 -07:00
Ian Romanick
033405cd4b intel/brw: Combine constants and constant propagation for CSEL
No shader-db or fossil-db changes on any Intel platform. This ends up
begin helpful in "intel/brw: Use range analysis to optimize fsign."

v2: Add integer CSEL support

v3: Massive simplification (-20 lines!) of constant propagation
logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm.
Noticed by Ken.

v4: While MAD can mix F and HF sources on some platforms, CSEL
cannot. Found by skqp on TGL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Kenneth Graunke
545bb8fb6f intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_*
Both of these helpers do the same thing.  We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Francisco Jerez
62aab1437e intel/fs/gfx20+: Handle subdword integer regioning restrictions in copy propagation.
This makes sure that copy propagation doesn't undo the lowering of
restricted sub-dword integer regions done by brw_fs_lower_regioning().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
2024-04-22 18:02:32 -07:00
Ian Romanick
0e817ba548 intel/brw/xe2+: Implement Wa 22016140776
HF sources to math instructions cannot be scalar. This is very similar
to an old Gfx6 restriction on POW, so let's fix it in a similar way.

As an extra bit of saftey, lower any occurances that might slip through
in brw_fs_lower_regioning.

The primary change is to prevent copy propagation from violating the
restriction. With that change, nothing should be able to generate these
invalid source strides. The modification to fs_visitor::validate should
detect potential problems sooner rather than later.

Previous attempts to implement this Wa when emitting the math
instruction (in brw_eu_emit.c gfx6_math) didn't work for several
reasons. The lowering happens after the SWSB pass, so the scoreboarding
was incorrect (thanks to Curro for finding that). In addition, the
lowering happens after register allocation, so it's impossible to
allocate a non-scalar register to expand the scalar value.

Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL.

v2: Add changes to brw_fs_lower_regioning. Suggested by Curro.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>
2024-04-04 21:04:09 -07:00
Rohan Garg
a715512177 intel/brw: adjust the copy propgation pass to account for wider GRF's on Xe2+
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>
2024-03-28 19:53:40 +00:00
Kenneth Graunke
c0c05c1041 intel/brw: Fix destination stride assertion in copy propagation
We were asserting that entry->dst.offset % REG_SIZE == 0, which is
easily tripped by a simple LOAD_PAYLOAD that writes a 16-bit vec2:

    load_payload(8) vgrf1:UW, vgrf2+0.0:UW, vgrf3+0.0:UW

We create separate ACP entries corresponding to the values coming from
vgrf2 and vgrf3, with entry->dst set to the location within vgrf1 where
those sources get written to.  So the second entry will have offset 16,
which is not REG_SIZE aligned.

It looks like this assert was originally added back in 2014 (see commit
1728e74957) and adjusted through the ages,
including at a point when we combined reg and subreg offsets into a
single byte offset, and over time also extended copy propagation.

Here the destination offset is already accounted for via rel_offset,
at the byte offset level, so things ought to work and there is no need
to assert that this is the case.  Ian had already noted that the assert
tripped in commit e3f502e007, but checking
for inst->opcode == MOV here doesn't really make sense - it's just the
case that he found that broke.

Remove the erroneous assertion.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
2024-03-27 04:52:17 +00:00
Ian Romanick
d9674cbe7d intel/brw: Combine constants for src0 of POW instructions too
I tried this when I was working on MR !7698, and it didn't have much
affect back then. Maybe I've added more stuff to my fossil-db?

Gfx12 platforms (Tiger Lake and DG2) are unaffected because the POW
instruction was removed.

shader-db:

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20301933 -> 20301900 (<.01%)
instructions in affected programs: 9077 -> 9044 (-0.36%)
helped: 33 / HURT: 0

total cycles in shared programs: 842797624 -> 842799471 (<.01%)
cycles in affected programs: 1361911 -> 1363758 (0.14%)
helped: 35 / HURT: 111

LOST:   0
GAINED: 9

fossil-db:

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 165510222 -> 165510163 (-0.00%)
Cycles: 15125195835 -> 15125194484 (-0.00%); split: -0.00%, +0.00%
Spill count: 45204 -> 45196 (-0.02%)
Fill count: 74157 -> 74149 (-0.01%)

Totals from 65 (0.01% of 656118) affected shaders:
Instrs: 57426 -> 57367 (-0.10%)
Cycles: 1667918 -> 1666567 (-0.08%); split: -0.11%, +0.03%
Spill count: 137 -> 129 (-5.84%)
Fill count: 515 -> 507 (-1.55%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
2024-03-12 21:31:30 +00:00
Ian Romanick
e7480f94c1 intel/brw: Combine constants for src0 of integer multiply too
The majority of cases that would have been affected by this actually
had both sources as integer constants. The earlier commit "intel/rt:
Don't directly generate umul_32x16" allowed those to be constant
folded.

v2: Move the a*-1 block to be near the existing a*-1 block.

No shader-db changes on any Intel platform.

fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165510246 -> 165510222 (-0.00%)
Cycles: 15125198238 -> 15125195835 (-0.00%); split: -0.00%, +0.00%

Totals from 46 (0.01% of 656118) affected shaders:
Instrs: 36010 -> 35986 (-0.07%)
Cycles: 2613658 -> 2611255 (-0.09%); split: -0.17%, +0.07%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
2024-03-12 21:31:30 +00:00
Kenneth Graunke
f6ac6c94a9 intel/brw: Handle SHADER_OPCODE_SEND without src[3] in copy prop
We construct some SENDs with only 3 sources (such as FB writes).
This code could read out of bounds.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
2024-03-05 11:39:26 +00:00
Kenneth Graunke
49606ab067 intel/brw: Avoid copy propagating any fixed registers into EOTs
We were handling FIXED_GRF, but we probably also ought to handle ATTR
(pushed inputs) and UNIFORM (pushed constants).  Just check if file
isn't VGRF to handle everything.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
2024-03-05 11:39:26 +00:00
Kenneth Graunke
45a5e4c0c4 intel/brw: Delete SHADER_OPCODE_TXF_UMS
Nothing seems to generate this anymore.  I guess we always use CMS.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
2024-03-01 22:19:51 +00:00
Kenneth Graunke
601ef12467 intel/brw: Delete SHADER_OPCODE_TXF_CMS[_LOGICAL]
We always use the wide variant (_W) on hardware this compiler supports.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
2024-03-01 22:19:50 +00:00
Caio Oliveira
082735750b intel/brw: Simplify usage of reg immediate helpers
Use fs_reg and don't take the type as argument.  In all uses the type
passed is the type of the register.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27904>
2024-03-01 17:52:09 +00:00
Caio Oliveira
8f3c52c1da intel/brw: Remove MRF type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
5c93a0e125 intel/brw: Remove Gfx8- remaining opcodes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
7ac5696157 intel/brw: Remove Gfx8- code from backend passes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Sagar Ghuge
6f0ab5e4d5 intel/compiler: Add texture gather offset LOD/Bias message support
v2: (Ian)
- Space formatting on conditional statement

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Sagar Ghuge
79af0ac29a intel/compiler: Add gather4_i/l/[_c]/b sampler message
v2: (Ian)
- Format comment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Caio Oliveira
6a3329a6c4 intel/brw: Pull opt_copy_propagation out of fs_visitor
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
2024-02-26 20:54:24 +00:00
Caio Oliveira
b5cd91501d intel/fs: Use linear allocator in opt_copy_propagation
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25670>
2024-01-04 23:06:07 +00:00
Caio Oliveira
6d2503e935 intel/fs: Only allocate acp_entry if we are adding one
In practice it seems we are always entering here, haven't looked
in detail whether at this point we could just assert.  But for now
only allocate a new acp_entry if we are going to add it.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25670>
2024-01-04 23:06:07 +00:00
Francisco Jerez
6bf99e6a45 intel/compiler: Don't change types for copies from ATTR file.
Since the <8;8,0> regions they use in multipolygon mode could violate
regioning restrictions in some cases, depending on the execution type
of the instruction.  Note that the assertion is removed from
try_copy_propagate() since a more accurate check is used within that
function than what fs_inst::can_change_types() can do.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:31 +00:00
Francisco Jerez
2ed36050fb intel/fs: Don't copy-propagate ATTR registers in multi-polygon FS shaders when invalid.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:31 +00:00
Jordan Justen
3f89fa63e6 intel/compiler: Pass max_polygons to copy-prop from fs_visitor.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:31 +00:00
Ian Romanick
92f5442489 intel/fs: Merge copy prop dataflow loops
This is kept as a separate commit because the change looks like a lot
more than it it. The order of the two loops is swapped, then the two
loops are merged.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
fa2757aa97 intel/fs: Use rb_tree for copy prop dataflow
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
35644bb483 intel/fs: Use rb_tree to store ACP entries by destination
Using a single data structure seems better. There's no appreciable
performance change. On batman_arkham_city_goty.foz, the difference
reported was 0.48%±0.36% (n=20). Several commits in the MR, including
some that should have no effect at all, reported similar changes. I
attribute this primarily changing of loop alignments and similar.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
c28bf1a249 intel/fs: Use rb_tree to store ACP entries by source
On batman_arkham_city_goty.foz, this improves fossil-db time by
-3.83%±0.24% (n=20). This fossil takes the longest time of any in my
database.

v2: Add some comments for cmp_entry_src_entry_src and
cmp_entry_src_nr. Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
06bdd3eac0 intel/fs: Encapsulate per-block ACP in a structure
This simplifies some later changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
c262752d74 intel/fs: Make opt_copy_propagation_local file private
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
0946108298 intel/fs: Simplify check in can_propagate_from
The larger predicate here already requires that inst->opcode must be
BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the
other simplifications are pretty straight forward.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
1f15a0f8b2 intel/fs: Don't loop in try_constant_propagate
The caller already loops over the sources. This means that the caller
must loop over the sources in reverse because constant propagation
prefers to propagate into the last sources first.

The shader-db and fossil-db changes (below) are all due to SEL
instructions. Changing the order sources are visited changes whether a
SEL with two immediate sources is

    (+f0.0) sel     g12    IMM_A    IMM_B

or

    (-f0.0) sel     g12    IMM_B    IMM_A

The ordering of the sources affects the order the constant combining
encounters the values, and the determines which value is "combined"
and which value remains an immediate.

This affects the results by luck. If there are two instructions:

    (+f0.0) sel     g12    IMM_A    IMM_B
    (+f0.0) sel     g13    IMM_A    IMM_C

Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the
selection algorithm in constant combining is greedy, this case
requires the algorithm see the values in just the right order for the
right thing to happen.

v2: Rebase on many, many changes. Move instruction source fixup
reordering out or try_constant_propagate.

v3: Rebase on !7698.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
ab23d89ade intel/fs: Move src.file checks out of try_constant_propagate and try_copy_propagate
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
b5b2338c5c intel/fs: Make try_constant_propagate and try_copy_propagate file private
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:22 +00:00
Ian Romanick
8665e37960 intel/fs: Don't try to copy propagate into a source again after progress is made
If the linked list structure used depended on the list head to know when
to terminate, this would be a pretty serious bug. If try_constant_propage
or try_copy_propagate make progress, inst->src[i].nr will change. This
results in the foreach_in_list using a different list header on later
iterations of the loop.

This causes two shaders in shader-db and 9 shaders in fossil-db to
change. Looking at the code changes, these are cases where there was a
copy of a copy that gets propagated. The part that confuses me is the
VGRF numbers involved should **not** hash to the same bucket, so it
should be impossible to find the original source from the intermediate
VGRF.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:22 +00:00
Ian Romanick
c506d7e511 intel/fs: Combine constants for integer instructions too
v2: Remove type change for SHR with negation.  This was a leftover from
a previous attempt to deal with SHR and negation.  Now all right-shifts
with unsigned parameters are marked as not being able to have source
modifiers.

v3: Disallow negations on right shifts of unsigned sources by setting
the no_negations flag in add_candidate_immediate.  This eliminates the
need to exclude SHR in can_do_source_mods.

Tiger Lake
total instructions in shared programs: 21102817 -> 21099443 (-0.02%)
instructions in affected programs: 296796 -> 293422 (-1.14%)
helped: 92 / HURT: 356

total cycles in shared programs: 790564691 -> 790393358 (-0.02%)
cycles in affected programs: 36456886 -> 36285553 (-0.47%)
helped: 171 / HURT: 286

total spills in shared programs: 3951 -> 3959 (0.20%)
spills in affected programs: 176 -> 184 (4.55%)
helped: 0 / HURT: 2

total fills in shared programs: 2631 -> 2639 (0.30%)
fills in affected programs: 176 -> 184 (4.55%)
helped: 0 / HURT: 2

LOST:   0
GAINED: 4

Ice Lake
total instructions in shared programs: 19954204 -> 19949122 (-0.03%)
instructions in affected programs: 40301 -> 35219 (-12.61%)
helped: 23 / HURT: 2

total cycles in shared programs: 858377735 -> 858462082 (<.01%)
cycles in affected programs: 75537286 -> 75621633 (0.11%)
helped: 124 / HURT: 319

total spills in shared programs: 6255 -> 6190 (-1.04%)
spills in affected programs: 392 -> 327 (-16.58%)
helped: 1 / HURT: 2

total fills in shared programs: 7813 -> 7382 (-5.52%)
fills in affected programs: 942 -> 511 (-45.75%)
helped: 1 / HURT: 2

LOST:   0
GAINED: 3

Skylake
total instructions in shared programs: 18049362 -> 18044440 (-0.03%)
instructions in affected programs: 48317 -> 43395 (-10.19%)
helped: 26 / HURT: 2

total cycles in shared programs: 844884806 -> 844915655 (<.01%)
cycles in affected programs: 76137133 -> 76167982 (0.04%)
helped: 171 / HURT: 293

total spills in shared programs: 6148 -> 6149 (0.02%)
spills in affected programs: 595 -> 596 (0.17%)
helped: 4 / HURT: 2

total fills in shared programs: 7484 -> 7067 (-5.57%)
fills in affected programs: 1226 -> 809 (-34.01%)
helped: 4 / HURT: 2

LOST:   0
GAINED: 8

Broadwell
total instructions in shared programs: 17826844 -> 17821805 (-0.03%)
instructions in affected programs: 60687 -> 55648 (-8.30%)
helped: 28 / HURT: 8

total cycles in shared programs: 905332682 -> 904369499 (-0.11%)
cycles in affected programs: 76743509 -> 75780326 (-1.26%)
helped: 179 / HURT: 225

total spills in shared programs: 17922 -> 17908 (-0.08%)
spills in affected programs: 2495 -> 2481 (-0.56%)
helped: 6 / HURT: 8

total fills in shared programs: 26290 -> 25397 (-3.40%)
fills in affected programs: 2606 -> 1713 (-34.27%)
helped: 8 / HURT: 6

LOST:   1
GAINED: 1

Haswell
total instructions in shared programs: 16678878 -> 16674444 (-0.03%)
instructions in affected programs: 78458 -> 74024 (-5.65%)
helped: 87 / HURT: 6

total cycles in shared programs: 880189381 -> 880301043 (0.01%)
cycles in affected programs: 29956463 -> 30068125 (0.37%)
helped: 169 / HURT: 163

total spills in shared programs: 14428 -> 14378 (-0.35%)
spills in affected programs: 2384 -> 2334 (-2.10%)
helped: 8 / HURT: 6

total fills in shared programs: 16975 -> 16881 (-0.55%)
fills in affected programs: 1334 -> 1240 (-7.05%)
helped: 10 / HURT: 4

Ivy Bridge
total instructions in shared programs: 15706048 -> 15706035 (<.01%)
instructions in affected programs: 9941 -> 9928 (-0.13%)
helped: 13 / HURT: 0

total cycles in shared programs: 433618834 -> 433624637 (<.01%)
cycles in affected programs: 12926714 -> 12932517 (0.04%)
helped: 52 / HURT: 41

Sandy Bridge
total cycles in shared programs: 741223552 -> 741223443 (<.01%)
cycles in affected programs: 19814 -> 19705 (-0.55%)
helped: 14 / HURT: 0

No changes on Iron Lake or GM45

fossil-db changes:

Tiger Lake
Instructions in all programs: 156858030 -> 156905532 (+0.0%)
Instructions helped: 3915
Instructions hurt: 15411

Cycles in all programs: 7529667771 -> 7532117340 (+0.0%)
Cycles helped: 10260
Cycles hurt: 9990

Spills in all programs: 5610 -> 5457 (-2.7%)
Spills helped: 18

Fills in all programs: 6274 -> 6091 (-2.9%)
Fills helped: 18

Gained: 2
Lost: 16

Ice Lake
Instructions in all programs: 141308082 -> 141303083 (-0.0%)
Instructions helped: 574
Instructions hurt: 172

Cycles in all programs: 9091361325 -> 9094622766 (+0.0%)
Cycles helped: 8764
Cycles hurt: 11702

Spills in all programs: 7531 -> 7385 (-1.9%)
Spills helped: 19

Fills in all programs: 8462 -> 8294 (-2.0%)
Fills helped: 19

Gained: 22
Lost: 15

Skylake
Instructions in all programs: 131872162 -> 131867263 (-0.0%)
Instructions helped: 566
Instructions hurt: 172

Cycles in all programs: 8795095440 -> 8799676943 (+0.1%)
Cycles helped: 8333
Cycles hurt: 12182

Spills in all programs: 7006 -> 6884 (-1.7%)
Spills helped: 13

Fills in all programs: 7696 -> 7552 (-1.9%)
Fills helped: 13

Gained: 24
Lost: 1

Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
2023-08-29 19:01:36 +00:00