We were handling FIXED_GRF, but we probably also ought to handle ATTR
(pushed inputs) and UNIFORM (pushed constants). Just check if file
isn't VGRF to handle everything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
In practice it seems we are always entering here, haven't looked
in detail whether at this point we could just assert. But for now
only allocate a new acp_entry if we are going to add it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25670>
Since the <8;8,0> regions they use in multipolygon mode could violate
regioning restrictions in some cases, depending on the execution type
of the instruction. Note that the assertion is removed from
try_copy_propagate() since a more accurate check is used within that
function than what fs_inst::can_change_types() can do.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
This is kept as a separate commit because the change looks like a lot
more than it it. The order of the two loops is swapped, then the two
loops are merged.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
Using a single data structure seems better. There's no appreciable
performance change. On batman_arkham_city_goty.foz, the difference
reported was 0.48%±0.36% (n=20). Several commits in the MR, including
some that should have no effect at all, reported similar changes. I
attribute this primarily changing of loop alignments and similar.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
On batman_arkham_city_goty.foz, this improves fossil-db time by
-3.83%±0.24% (n=20). This fossil takes the longest time of any in my
database.
v2: Add some comments for cmp_entry_src_entry_src and
cmp_entry_src_nr. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
The larger predicate here already requires that inst->opcode must be
BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the
other simplifications are pretty straight forward.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
The caller already loops over the sources. This means that the caller
must loop over the sources in reverse because constant propagation
prefers to propagate into the last sources first.
The shader-db and fossil-db changes (below) are all due to SEL
instructions. Changing the order sources are visited changes whether a
SEL with two immediate sources is
(+f0.0) sel g12 IMM_A IMM_B
or
(-f0.0) sel g12 IMM_B IMM_A
The ordering of the sources affects the order the constant combining
encounters the values, and the determines which value is "combined"
and which value remains an immediate.
This affects the results by luck. If there are two instructions:
(+f0.0) sel g12 IMM_A IMM_B
(+f0.0) sel g13 IMM_A IMM_C
Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the
selection algorithm in constant combining is greedy, this case
requires the algorithm see the values in just the right order for the
right thing to happen.
v2: Rebase on many, many changes. Move instruction source fixup
reordering out or try_constant_propagate.
v3: Rebase on !7698.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
If the linked list structure used depended on the list head to know when
to terminate, this would be a pretty serious bug. If try_constant_propage
or try_copy_propagate make progress, inst->src[i].nr will change. This
results in the foreach_in_list using a different list header on later
iterations of the loop.
This causes two shaders in shader-db and 9 shaders in fossil-db to
change. Looking at the code changes, these are cases where there was a
copy of a copy that gets propagated. The part that confuses me is the
VGRF numbers involved should **not** hash to the same bucket, so it
should be impossible to find the original source from the intermediate
VGRF.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
It is very common to have bcsel where the second and third sources are
both constants. This results in a situation where we would want to emit
a SEL with two constant sources, but that's not allowed.
Previously, we would load both constants into registers, then let
constant propagation copy the last constant into the SEL instruction.
This results in the constant using an entire SIMD register instead of a
single channel.
Instead, copy propagate both sources, then let the combine-constants
pass do its thing. In the worst case, this stores the constant in a
single channel of the SIMD register. In the best case, it reuses a
value that was loaded into a register to satisfy another instruction.
shader-db results:
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951549 -> 19948709 (-0.01%)
instructions in affected programs: 482795 -> 479955 (-0.59%)
helped: 1184 / HURT: 3
total cycles in shared programs: 858584724 -> 858205341 (-0.04%)
cycles in affected programs: 356168375 -> 355788992 (-0.11%)
helped: 1448 / HURT: 1195
total spills in shared programs: 6569 -> 6255 (-4.78%)
spills in affected programs: 912 -> 598 (-34.43%)
helped: 58 / HURT: 0
total fills in shared programs: 8218 -> 7813 (-4.93%)
fills in affected programs: 1570 -> 1165 (-25.80%)
helped: 58 / HURT: 0
LOST: 6
GAINED: 16
Broadwell
total instructions in shared programs: 17819660 -> 17819389 (<.01%)
instructions in affected programs: 1078129 -> 1077858 (-0.03%)
helped: 1067 / HURT: 304
total cycles in shared programs: 904722624 -> 905035016 (0.03%)
cycles in affected programs: 362583117 -> 362895509 (0.09%)
helped: 1381 / HURT: 1123
total spills in shared programs: 17884 -> 17922 (0.21%)
spills in affected programs: 5088 -> 5126 (0.75%)
helped: 55 / HURT: 152
total fills in shared programs: 25533 -> 26290 (2.96%)
fills in affected programs: 12992 -> 13749 (5.83%)
helped: 61 /HURT: 295
LOST: 7
GAINED: 24
Haswell
total instructions in shared programs: 16678080 -> 16673976 (-0.02%)
instructions in affected programs: 1162893 -> 1158789 (-0.35%)
helped: 1584 / HURT: 7
total cycles in shared programs: 880180082 -> 879932525 (-0.03%)
cycles in affected programs: 364067522 -> 363819965 (-0.07%)
helped: 1226 / HURT: 976
total spills in shared programs: 14937 -> 14428 (-3.41%)
spills in affected programs: 7866 -> 7357 (-6.47%)
helped: 351 / HURT: 5
total fills in shared programs: 17572 -> 16975 (-3.40%)
fills in affected programs: 11028 -> 10431 (-5.41%)
helped: 350 / HURT: 3
LOST: 8
GAINED: 16
Ivy Bridge
total instructions in shared programs: 15704044 -> 15703158 (<.01%)
instructions in affected programs: 304513 -> 303627 (-0.29%)
helped: 707 / HURT: 0
total cycles in shared programs: 433560149 -> 433471118 (-0.02%)
cycles in affected programs: 19299650 -> 19210619 (-0.46%)
helped: 687 / HURT: 395
LOST: 2
GAINED: 9
Sandy Bridge
total instructions in shared programs: 13913386 -> 13912884 (<.01%)
instructions in affected programs: 195687 -> 195185 (-0.26%)
helped: 455 / HURT: 0
total cycles in shared programs: 741156272 -> 741136266 (<.01%)
cycles in affected programs: 10934349 -> 10914343 (-0.18%)
helped: 578 / HURT: 289
LOST: 9
GAINED: 4
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8364056 -> 8364042 (<.01%)
instructions in affected programs: 5178 -> 5164 (-0.27%)
helped: 10 / HURT: 0
total cycles in shared programs: 248759794 -> 248757940 (<.01%)
cycles in affected programs: 4305246 -> 4303392 (-0.04%)
helped: 183 / HURT: 24
fossil-db results:
Tiger Lake
Instructions in all programs: 156943594 -> 156802601 (-0.1%)
Instructions helped: 20595
Instructions hurt: 23248
Cycles in all programs: 7512086950 -> 7528386387 (+0.2%)
Cycles helped: 29531
Cycles hurt: 27837
Spills in all programs: 13500 -> 5643 (-58.2%)
Spills helped: 394
Spills hurt: 22
Fills in all programs: 18943 -> 6306 (-66.7%)
Fills helped: 394
Fills hurt: 11
Gained: 93
Lost: 76
Ice Lake
Instructions in all programs: 141395899 -> 141249621 (-0.1%)
Instructions helped: 30067
Instructions hurt: 3
Cycles in all programs: 9097127057 -> 9089668235 (-0.1%)
Cycles helped: 32268
Cycles hurt: 24315
Spills in all programs: 13695 -> 7564 (-44.8%)
Spills helped: 403
Fills in all programs: 18400 -> 8494 (-53.8%)
Fills helped: 403
Gained: 114
Lost: 137
Skylake
Instructions in all programs: 131948328 -> 131826063 (-0.1%)
Instructions helped: 29968
Instructions hurt: 3
Cycles in all programs: 8794778440 -> 8793934844 (-0.0%)
Cycles helped: 32705
Cycles hurt: 23575
Spills in all programs: 10526 -> 7039 (-33.1%)
Spills helped: 403
Fills in all programs: 11025 -> 7728 (-29.9%)
Fills helped: 403
Gained: 102
Lost: 250
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
This is a modified version of a commit originally in !7698. This version
add the changes to brw_fs_copy_propagation. If the address passed to
fs_visitor::swizzle_nir_scratch_addr is a constant, that function will
generate SHL with two constant sources.
DG2 uses a different path to generate those addresses, so the constant
folding can't occur there yet. That will be addressed in the next
commit.
What follows is the commit change history from that older MR.
v2: Previously this commit was after `intel/fs: Combine constants for
integer instructions too`. However, this commit can create invalid
instructions that are only cleaned up by `intel/fs: Combine constants
for integer instructions too`. That would potentially affect the
shader-db results of each commit, but I did not collect new data for
the reordering.
v3: Fix masking for W/UW and for Q/UQ types. Add an assertion for
!saturate. Both suggested by Ken. Also add an assertion that B/UB types
don't matically come back.
v4: Fix sources count. See also ed3c2f73db ("intel/fs: fixup sources
number from opt_algebraic").
v5: Fix typo in comment added in v3. Noticed by Marcin. Fix a typo in a
comment added when pulling this commit out of !7698. Noticed by Ken.
shader-db results:
DG2
No changes.
Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20655696 -> 20651648 (-0.02%)
instructions in affected programs: 23125 -> 19077 (-17.50%)
helped: 7 / HURT: 0
total cycles in shared programs: 858436639 -> 858407749 (<.01%)
cycles in affected programs: 8990532 -> 8961642 (-0.32%)
helped: 7 / HURT: 0
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18500780 -> 18496630 (-0.02%)
instructions in affected programs: 24715 -> 20565 (-16.79%)
helped: 7 / HURT: 0
total cycles in shared programs: 946100660 -> 946087688 (<.01%)
cycles in affected programs: 5838252 -> 5825280 (-0.22%)
helped: 7 / HURT: 0
total spills in shared programs: 17588 -> 17572 (-0.09%)
spills in affected programs: 1206 -> 1190 (-1.33%)
helped: 2 / HURT: 0
total fills in shared programs: 25192 -> 25156 (-0.14%)
fills in affected programs: 156 -> 120 (-23.08%)
helped: 2 / HURT: 0
No shader-db changes on any older Intel platforms.
fossil-db results:
DG2
Totals:
Instrs: 197780415 -> 197780372 (-0.00%); split: -0.00%, +0.00%
Cycles: 14066412266 -> 14066410782 (-0.00%); split: -0.00%, +0.00%
Totals from 16 (0.00% of 668055) affected shaders:
Instrs: 16420 -> 16377 (-0.26%); split: -0.43%, +0.17%
Cycles: 220133 -> 218649 (-0.67%); split: -0.69%, +0.01%
Tiger Lake, Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 153425977 -> 153423678 (-0.00%)
Cycles: 14747928947 -> 14747929547 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 8535968 -> 8535976 (+0.00%)
Send messages: 7697606 -> 7697607 (+0.00%)
Scratch Memory Size: 4380672 -> 4381696 (+0.02%)
Totals from 6 (0.00% of 662749) affected shaders:
Instrs: 13893 -> 11594 (-16.55%)
Cycles: 5386074 -> 5386674 (+0.01%); split: -0.42%, +0.43%
Subgroup size: 80 -> 88 (+10.00%)
Send messages: 675 -> 676 (+0.15%)
Scratch Memory Size: 91136 -> 92160 (+1.12%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
Code already exists to convert SHADER_OPCODE_SHUFFLE into a simple MOV
when either source is constant. However... the constants have to
actually get into those sources!
On a shader that I'm working on that multiplies very large matrices using
lots of subgroup operations,
-SIMD8 shader: 1378 instructions. 3 loops. 793896 cycles. 0:0 spills:fills, 23 sends, scheduled with mode non-lifo. Promoted 0 constants. Compacted 22048 to 21664 bytes (2%)
+SIMD8 shader: 346 instructions. 3 loops. 61742 cycles. 0:0 spills:fills, 23 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 5536 to 5216 bytes (6%)
No changes in shader-db or fossil-db on any Intel platform.
v2: Merge a bunch of identical cases. Suggested by Ken.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23609>
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.
v3: Remove redundant patterns. Noticed by Matt.
shader-db:
DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15
total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32
Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.
All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.
fossil-db:
DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210
Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922
Lost: 90
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
Since one of the register must always be either VGRF or FIXED_GRF, much
of regions_overlap and reg_offset can be elided.
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.29% ± 0.097% (n = 5, pooled s = 0.361697).
Using a release build, improves performance of compiling shaders from
batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s =
0.178312).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
There are already NIR algebraic optimizations (see also ac6646129f
("nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.") that
will try to remove the saturate from things like
fmax(0.5, fsat(x))
This basically reverts 40aeb558ce ("i965/fs: Allow propagation of
instructions with saturate flag to sel"). That commit message had no
shader-db information, so it's unclear whether this actually helped
anything ever.
No shader-db changes on any Intel platform.
One shader in Far Cry New Dawn was affected.
Cycles in all programs: 10933090738 -> 10933090736 (-0.0%)
Cycles helped: 1
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>
This fixes the behavior of copy propagation in cases where either the
source or destination of an ACP is overwritten elsewhere in the
program by a force_writemask_all instruction, which could cause the
overwrite to be executed for an inactive channel under non-uniform
control flow, causing the current per-channel dataflow propagation to
give incorrect results. This has been reported in cases like:
> while (true) {
> x = imageSize(img);
> if (non_uniform_condition()) {
> y = x;
> break;
> }
> }
> use(y);
Currently the copy propagation pass would propagate copy 'y = x' into
'use(y)', which is invalid since in the example above imageSize() is
implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence replacing 'y' with 'x' at 'use(y)' changes the behavior
of the program.
This patch extends the global dataflow analysis algorithm to determine
whether there is any control flow path from a given copy to an
overwrite of its source or destination which has force_writemask_all
behavior inconsistent with the copy, and in such case prevents copy
propagation for that ACP entry at any point of the program which can
be reached from the overwrite, even if the copy is statically
re-executed along all such control flow paths (as in the example
above), since the execution of the overwrite for a given channel i may
corrupt other channels j!=i inactive for the subsequently re-executed
copy.
Note that a simpler solution has been attempted which fully shuts down
copy propagation if such a force_writemask_all ACP overwrite is
present /anywhere/ in the program regardless of its location in the
control flow graph, however that led to large shader-db regressions in
some programs from shader-db (like a CS from Car Chase which would
emit 53% more instructions). With this solution the only handful of
shaders that suffer instruction count regressions seem to be getting
misoptimized right now (e.g. some compute shaders from Deus Ex
Mankind). This solution doesn't seem to affect the run-time of
shader-db significantly, it's less than 1% higher with the fix
applied.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
force_writemask_all determines whether all channels of the copy are
actually valid, and may be required to be set for it to be propagated
safely in cases where the destination of the copy is used by another
force_writemask_all instruction, or when the copy occurs in a
divergent control flow block different from its use.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
The ACP entries created by copy propagation to track the implied
copies of LOAD_PAYLOAD instructions don't model the behavior of
LOAD_PAYLOAD correctly, since (as of 41868bb682) header
moves are implicitly retyped to UD and the destination of non-header
copies implicitly uses the same type as the corresponding source, even
though the ACP entries created for such copies could incorrectly
represent a type conversion, which can lead to mis-optimization of the
program.
According to Marcin, this fixes the func.mesh.ext.workgroup_id.task.q0
crucible test.
Fixes: 41868bb682 ("i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction")
Reported-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18980>
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation. Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor. No need for a separate opcode.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
This enables copy propagation of
mov(8) g5<1>UD 0x00000180UD
mul(8) g10<1>D g2.3<0,1,0>D g5<16,8,2>W
into
mul(8) g10<1>D g2.3<0,1,0>D 180W
This is necessary for any optimization passes that generate imul_32x16
instructions.
No fossil-db or shader-db changes on any Intel platform.
v2: Fix type size check to (src size != 2) || (dest size != 4). It was
previously &&. :( This allowed copying constants into UB sources, and
that is invalid.
v3: Fix incorrect extraction of upper 16-bits of immediate value when
subnr=2. Noticed by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
Don't copy propagate the constant in situations like
mov(8) g8<1>D 0x7fffffffD
mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W
On platforms that only have a 32x16 multiplier, this will result in
lowering the multiply to
mul(8) g15<1>D g14<8,8,1>D 0xffffUW
mul(8) g16<1>D g14<8,8,1>D 0x7fffUW
add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW
On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in
mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD
Volume 2a of the Skylake PRM says:
When multiplying a DW and any lower precision integer, the
DW operand must on src0.
See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104.
Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I
don't think it would be possible to create a situation where this could
occur. I discovered this via some optimizations that can determine that
the non-constant source must be able to fit in 16-bits. The case listed
above came from piglit's "ext_transform_feedback-order arrays points"
with those optimizations in place.
No shader-db or fossil-db changes on any Intel platform.
Fixes: de6c0f8487 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
EOT messages need to use g112-g127 for their sources. With the new
opt_split_sends pass, we may be constructing an EOT message from two
different registers, and be able to copy propagate the original values
into those SENDs.
This can cause problems if we copy propagate from a large register
(say an RGBA value which is 4 GRFs in SIMD8 or 8 GRFs in SIMD16), in a
situation where the SEND only read a subset of that (say the alpha value
out of an RGBA texturing result). g112-127 can only hold 16 registers
worth of data, and sometimes we can only use g112-126. So, we can't
propagate if the GRFs in question are larger than 15 GRFs.
Fixes a shader validation failure in Alan Wake. Thanks to Ian Romanick
for catching this!
shader-db on Icelake shows that only SIMD32 programs are affected, and
the results are pretty negligable:
total instructions in shared programs: 19615228 -> 19615269 (<.01%)
instructions in affected programs: 10702 -> 10743 (0.38%)
helped: 1 / HURT: 43 / largest change: +/- 2 instructions
total cycles in shared programs: 852001706 -> 852001566 (<.01%)
cycles in affected programs: 767098 -> 766958 (-0.02%)
helped: 68 / HURT: 64 / largest change: +/- 774 cycles
GAINED: 2 / LOST: 0
Fixes: 589b03d02f ("intel/fs: Opportunistically split SEND message payloads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6803
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17390>
This structure will contain the opcode mapping tables in the next
commit. For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
This allows, for example, fs_inst::components_read() without passing
devinfo as extra argument.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
For SIMD8 half float payload, each component takes a full register, so
we can use existing LOAD_PAYLOAD infrastruture for required padding by
alternating plain 8-wide half float vector and null vector.
Also this patch removes an unwanted assertion from
opt_copy_propagation_local for LOAD_PAYLOAD.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>