These have now been replaced by the MEMORY_*_LOGICAL opcodes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
This is a new unified set of opcodes for memory access loosely patterned
after the new LSC-style data port messages introduced on Alchemist GPUs.
Rather than creating an opcode for every type of memory access, it has
only three opcodes: load, store, and atomic. It has various sources to
indicate the rest:
- Binding type (raw pointer, pointer to surface state, or BT index)
- Address size (A64, A32, A16)
- Data size (bit size, number of components)
- Opcode (atomic opcode, or LOAD/STORE vs. LOAD_CMASK/STORE_CMASK)
- Mode (typed vs. untyped vs. shared-local vs. scratch)
- Address (and its dimensionality)
- Data (0 for loads, 1 for stores, 2 for atomics)
- Whether we want block access
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
This prevents some regressions later in the MR. Once load_const
operations are marked as is_scalar, they will cesase to get the
automatic constant propagation that occurs in try_rebuild_source.
No shader-db or fossil-db changes on any Intel platform.
v2: Slightly relax source restrictions on
SHADER_OPCODE_UNALIGNED_OWORD_BLOCK_READ_LOGICAL. Add a comment
explaining the restriction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
The specific pattern from the unit test was observed in ray tracing
trampoline shaders.
v2: Refactor the is_raw_move tests out to a utility function. Suggested
by Ken.
v3: Fix a regression caused by being too picky about source
modifiers. This was introduced somewhere between when I did initial
shader-db runs an v2.
v4: Fix typo in comment. Noticed by Caio.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19734086 -> 19733997 (<.01%)
instructions in affected programs: 135388 -> 135299 (-0.07%)
helped: 76 / HURT: 2
total cycles in shared programs: 916290451 -> 916264968 (<.01%)
cycles in affected programs: 41046002 -> 41020519 (-0.06%)
helped: 32 / HURT: 29
fossil-db:
Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531355 -> 151513669 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17209372399 -> 17208178205 (-0.01%); split: -0.01%, +0.00%
Max live registers: 32016490 -> 32016493 (+0.00%)
Totals from 17361 (2.75% of 630198) affected shaders:
Instrs: 2642048 -> 2624362 (-0.67%); split: -0.67%, +0.00%
Cycle count: 79803066 -> 78608872 (-1.50%); split: -1.75%, +0.25%
Max live registers: 421668 -> 421671 (+0.00%)
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149995644 -> 149977326 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15567293770 -> 15566524840 (-0.00%); split: -0.02%, +0.01%
Spill count: 61241 -> 61238 (-0.00%)
Fill count: 107304 -> 107301 (-0.00%)
Max live registers: 31993109 -> 31993112 (+0.00%)
Totals from 17813 (2.83% of 629912) affected shaders:
Instrs: 3738236 -> 3719918 (-0.49%); split: -0.49%, +0.00%
Cycle count: 4251157049 -> 4250388119 (-0.02%); split: -0.06%, +0.04%
Spill count: 28268 -> 28265 (-0.01%)
Fill count: 50377 -> 50374 (-0.01%)
Max live registers: 470648 -> 470651 (+0.00%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
We could be trying to extract a D/UD from a Q/UQ, for example. We were
ignoring the top 32-bits, which is incorrect.
Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>
This function never expands a type - it only narrows it. As such, we
don't need to ever sign extend to fill additional new bits. I think
this code was left over from earlier versions of my optimization pass
that was buggy and trying to handle cases it should not have.
Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>
Doxygen documentation says
> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.
so we can omit the filename itself when using the annotation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>
Copy propagation often eliminates all uses of an instruction. If we
detect that we've done so, we can eliminate the instruction ourselves
rather than leaving it hanging until the next DCE pass.
This saves some CPU time as other passes don't see dead code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
This will let us reuse the bulk of this code in a new copy propagation
pass without replicating it. We retain a wrapper function for dealing
with ACP entries, which the new pass won't have.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
This was for Sandybridge's IF with embedded comparison, which only
existed for a single generation of hardware. Since the compiler fork,
we no longer support Sandybridge here.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
No shader-db or fossil-db changes on any Intel platform. This ends up
begin helpful in "intel/brw: Use range analysis to optimize fsign."
v2: Add integer CSEL support
v3: Massive simplification (-20 lines!) of constant propagation
logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm.
Noticed by Ken.
v4: While MAD can mix F and HF sources on some platforms, CSEL
cannot. Found by skqp on TGL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
Both of these helpers do the same thing. We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
This makes sure that copy propagation doesn't undo the lowering of
restricted sub-dword integer regions done by brw_fs_lower_regioning().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
HF sources to math instructions cannot be scalar. This is very similar
to an old Gfx6 restriction on POW, so let's fix it in a similar way.
As an extra bit of saftey, lower any occurances that might slip through
in brw_fs_lower_regioning.
The primary change is to prevent copy propagation from violating the
restriction. With that change, nothing should be able to generate these
invalid source strides. The modification to fs_visitor::validate should
detect potential problems sooner rather than later.
Previous attempts to implement this Wa when emitting the math
instruction (in brw_eu_emit.c gfx6_math) didn't work for several
reasons. The lowering happens after the SWSB pass, so the scoreboarding
was incorrect (thanks to Curro for finding that). In addition, the
lowering happens after register allocation, so it's impossible to
allocate a non-scalar register to expand the scalar value.
Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL.
v2: Add changes to brw_fs_lower_regioning. Suggested by Curro.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>
We were asserting that entry->dst.offset % REG_SIZE == 0, which is
easily tripped by a simple LOAD_PAYLOAD that writes a 16-bit vec2:
load_payload(8) vgrf1:UW, vgrf2+0.0:UW, vgrf3+0.0:UW
We create separate ACP entries corresponding to the values coming from
vgrf2 and vgrf3, with entry->dst set to the location within vgrf1 where
those sources get written to. So the second entry will have offset 16,
which is not REG_SIZE aligned.
It looks like this assert was originally added back in 2014 (see commit
1728e74957) and adjusted through the ages,
including at a point when we combined reg and subreg offsets into a
single byte offset, and over time also extended copy propagation.
Here the destination offset is already accounted for via rel_offset,
at the byte offset level, so things ought to work and there is no need
to assert that this is the case. Ian had already noted that the assert
tripped in commit e3f502e007, but checking
for inst->opcode == MOV here doesn't really make sense - it's just the
case that he found that broke.
Remove the erroneous assertion.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
I tried this when I was working on MR !7698, and it didn't have much
affect back then. Maybe I've added more stuff to my fossil-db?
Gfx12 platforms (Tiger Lake and DG2) are unaffected because the POW
instruction was removed.
shader-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20301933 -> 20301900 (<.01%)
instructions in affected programs: 9077 -> 9044 (-0.36%)
helped: 33 / HURT: 0
total cycles in shared programs: 842797624 -> 842799471 (<.01%)
cycles in affected programs: 1361911 -> 1363758 (0.14%)
helped: 35 / HURT: 111
LOST: 0
GAINED: 9
fossil-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 165510222 -> 165510163 (-0.00%)
Cycles: 15125195835 -> 15125194484 (-0.00%); split: -0.00%, +0.00%
Spill count: 45204 -> 45196 (-0.02%)
Fill count: 74157 -> 74149 (-0.01%)
Totals from 65 (0.01% of 656118) affected shaders:
Instrs: 57426 -> 57367 (-0.10%)
Cycles: 1667918 -> 1666567 (-0.08%); split: -0.11%, +0.03%
Spill count: 137 -> 129 (-5.84%)
Fill count: 515 -> 507 (-1.55%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
The majority of cases that would have been affected by this actually
had both sources as integer constants. The earlier commit "intel/rt:
Don't directly generate umul_32x16" allowed those to be constant
folded.
v2: Move the a*-1 block to be near the existing a*-1 block.
No shader-db changes on any Intel platform.
fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165510246 -> 165510222 (-0.00%)
Cycles: 15125198238 -> 15125195835 (-0.00%); split: -0.00%, +0.00%
Totals from 46 (0.01% of 656118) affected shaders:
Instrs: 36010 -> 35986 (-0.07%)
Cycles: 2613658 -> 2611255 (-0.09%); split: -0.17%, +0.07%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
We were handling FIXED_GRF, but we probably also ought to handle ATTR
(pushed inputs) and UNIFORM (pushed constants). Just check if file
isn't VGRF to handle everything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
In practice it seems we are always entering here, haven't looked
in detail whether at this point we could just assert. But for now
only allocate a new acp_entry if we are going to add it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25670>
Since the <8;8,0> regions they use in multipolygon mode could violate
regioning restrictions in some cases, depending on the execution type
of the instruction. Note that the assertion is removed from
try_copy_propagate() since a more accurate check is used within that
function than what fs_inst::can_change_types() can do.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
This is kept as a separate commit because the change looks like a lot
more than it it. The order of the two loops is swapped, then the two
loops are merged.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
Using a single data structure seems better. There's no appreciable
performance change. On batman_arkham_city_goty.foz, the difference
reported was 0.48%±0.36% (n=20). Several commits in the MR, including
some that should have no effect at all, reported similar changes. I
attribute this primarily changing of loop alignments and similar.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
On batman_arkham_city_goty.foz, this improves fossil-db time by
-3.83%±0.24% (n=20). This fossil takes the longest time of any in my
database.
v2: Add some comments for cmp_entry_src_entry_src and
cmp_entry_src_nr. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
The larger predicate here already requires that inst->opcode must be
BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the
other simplifications are pretty straight forward.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
The caller already loops over the sources. This means that the caller
must loop over the sources in reverse because constant propagation
prefers to propagate into the last sources first.
The shader-db and fossil-db changes (below) are all due to SEL
instructions. Changing the order sources are visited changes whether a
SEL with two immediate sources is
(+f0.0) sel g12 IMM_A IMM_B
or
(-f0.0) sel g12 IMM_B IMM_A
The ordering of the sources affects the order the constant combining
encounters the values, and the determines which value is "combined"
and which value remains an immediate.
This affects the results by luck. If there are two instructions:
(+f0.0) sel g12 IMM_A IMM_B
(+f0.0) sel g13 IMM_A IMM_C
Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the
selection algorithm in constant combining is greedy, this case
requires the algorithm see the values in just the right order for the
right thing to happen.
v2: Rebase on many, many changes. Move instruction source fixup
reordering out or try_constant_propagate.
v3: Rebase on !7698.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>