This isn't used now, but future commits will add uses. Doing this as a
separate commit removes a lot of "just typing" churn from commits that
have real changes to review.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
This reduces the penalty the heuristic gives to SIMD32 shaders
relative to SIMD16 in presence of discard control flow on Xe2+. The
penalty was meant to account for the inefficient divergence behavior
of SIMD32 shaders on Gfx12.x platforms, since Gfx12 hardware had EUs
bundled in groups of two, and each pair shared control flow logic so
both EUs could only execute instructions in lockstep, which meant that
SIMD32 shaders had an effective warp size of 64 on Gfx12.x.
This change switches back to more optimistic modelling of discard
divergence. With it we gain about 6% performance in a Shadow of the
Tomb Raider trace (tested on BMG).
One may wonder if there are still workloads that would suffer
materially from enabling SIMD32 for all pixel shaders on Xe2 instead
of using this heuristic, since Xe2 EUs have twice the GRF space, twice
the FPU throughput and better divergence behavior than Xe, but the
answer seems to be yes unfortunately: E.g. Superposition has some
pixel shaders where SIMD32 has substantially worse scheduling due to
the increased number of false dependencies due to higher register
pressure, and using SIMD32 for them reduces performance significantly.
The heuristic seems to model this correctly so it doesn't look like we
can do without it at least right now on Xe2.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31697>
These were removed with Icelake. While they technically still exist on
Skylake, which this compiler supports, we have never used these opcodes
in the 14 years we could have done so. So just scrap them.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29665>
This opcode gets translated to 2 ALU instructions with dependency ALU
stall. This change reproduces the FS_OPCODE_PACK_HALF_2x16_SPLIT
values which is another opcode that generates 2 instructions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
Another architecture register that requires some care before reading.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 49ee3ae9e8 ("intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+")
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29319>
Both of these helpers do the same thing. We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN. The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
We already have a logical opcode and lower to what is basically a send
instruction. We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
We first generate the logical opcodes, and these days fully lower to
SHADER_OPCODE_SEND. In the past, we lowered to a non-logical variant
and handled that in the generator. These days, we were just using the
non-logical opcodes as an awkward intermediate opcode change during
the lowering...which isn't really necessary at all.
This patch eliminates them by using the original logical opcodes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
This doesn't help very much now. A later commit adds a NIR optimization
pass, tentatively called nir_opt_uniform_subgroup, that converts many
kinds of subgroup operations to things involving
bitCount(ballot(true)). This commit makes a huge difference in the
results of that later commit.
No shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165558033 -> 165557519 (-0.00%)
Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00%
Totals from 299 (0.05% of 656117) affected shaders:
Instrs: 88293 -> 87779 (-0.58%)
Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03%
v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.
v3: Prevent lower_regioning from trying to "fix" DPAS sources.
v4: Add instruction latency information for scheduling and perf
estimates.
v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.
v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.
The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.
No shader-db changes on TGL & DG2.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
Before rebasing on top of Ken's split-SEND optimization (see !17018),
this commit just caused some scheduling changes in various tessellation
and geometry shaders. These changes were caused by the addition of real
latency information for the URB messages.
With the addition of the split-SEND optimization, the changes
are... staggering. All of the shaders helped for spills and fills are
vertex shaders from Batman Arkham Origins. What surprises me is that
these shaders account for such a high percentage of the spills and fills
in fossil-db. 85%?!?
v2: Use FIXED_GRF instead of BRW_GENERAL_REGISTER_FILE in an assertion.
Suggested by Ken.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20013625 -> 19954020 (-0.30%)
instructions in affected programs: 4007157 -> 3947552 (-1.49%)
helped: 31161
HURT: 0
helped stats (abs) min: 1 max: 400 x̄: 1.91 x̃: 2
helped stats (rel) min: 0.08% max: 59.70% x̄: 2.20% x̃: 1.83%
95% mean confidence interval for instructions value: -1.97 -1.86
95% mean confidence interval for instructions %-change: -2.22% -2.18%
Instructions are helped.
total cycles in shared programs: 859337569 -> 858636788 (-0.08%)
cycles in affected programs: 74168298 -> 73467517 (-0.94%)
helped: 13812
HURT: 16846
helped stats (abs) min: 1 max: 291078 x̄: 82.83 x̃: 4
helped stats (rel) min: <.01% max: 37.09% x̄: 3.47% x̃: 2.02%
HURT stats (abs) min: 1 max: 1543 x̄: 26.31 x̃: 14
HURT stats (rel) min: <.01% max: 77.97% x̄: 4.11% x̃: 2.58%
95% mean confidence interval for cycles value: -55.10 9.39
95% mean confidence interval for cycles %-change: 0.62% 0.77%
Inconclusive result (value mean confidence interval includes 0).
Broadwell
total cycles in shared programs: 904844939 -> 904832320 (<.01%)
cycles in affected programs: 525360 -> 512741 (-2.40%)
helped: 215
HURT: 4
helped stats (abs) min: 4 max: 1018 x̄: 60.16 x̃: 39
helped stats (rel) min: 0.14% max: 15.85% x̄: 2.16% x̃: 2.04%
HURT stats (abs) min: 79 max: 79 x̄: 79.00 x̃: 79
HURT stats (rel) min: 1.31% max: 1.57% x̄: 1.43% x̃: 1.43%
95% mean confidence interval for cycles value: -75.02 -40.22
95% mean confidence interval for cycles %-change: -2.37% -1.81%
Cycles are helped.
No shader-db changes on any older Intel platforms.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 142622800 -> 141461114 (-0.8%)
Instructions helped: 197186
Cycles in all programs: 9101223846 -> 9099440025 (-0.0%)
Cycles helped: 37963
Cycles hurt: 151233
Spills in all programs: 98829 -> 13695 (-86.1%)
Spills helped: 2159
Fills in all programs: 128142 -> 18400 (-85.6%)
Fills helped: 2159
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
An argument could be made that all stage-specific opcodes for vec4
stages should be prefixed with VEC4_ like the stage-agnostic opcodes.
I'll leave those additional sed jobs for another day.
egrep -lr '(VS|GS|TCS)_OPCODE_URB_WRITE' src |\
while read f; do
sed --in-place 's/\(VS\|GS\|TCS\)_OPCODE_URB_WRITE/VEC4_\1_OPCODE_URB_WRITE/g' $f
done
egrep -lr 'T.S_OPCODE[_A-Z]*URB_OFFSETS' src |\
while read f; do
sed --in-place 's/\(T.S_OPCODE[_A-Z]*URB_OFFSETS\)/VEC4_\1/g' $f
done
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
This structure will contain the opcode mapping tables in the next
commit. For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V. However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().
We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We'll want different types of IDs based on topology. Let's make this
more flexible and also move the bit shifting code a layer above where
it's easier to do bitshifting operations, especially if you need to
stash things into temporary registers.
v2: Keep previous comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
This is by far the more common style in Mesa. It also gives a cue that
e.g. num_dependency_ids is a fixed definition rather than some kind of
local variable maintaining a count.
While hre, we also rename the enums to have full prefixes to prepare for
a future where we use them in multiple files for future backend work.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14182>
When they re-arranged all the dataport stuff and added the LSC, doing
URB fencing through the dataport no longer makes sense. Instead, there
is now a fence message on the URB shared function.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
v2: Very significant rebase on changes to previous commits.
Specifically, brw_fs_nir.cpp changes were pretty much rewritten from
scratch after changing the NIR opcode names and types.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented
using a separte cmpn and sel instruction. This lowering occurs in
fs_vistor::lower_minmax which is called very, very late... a long, long
time after the first calls to opt_cmod_propagation. As a result,
conditional modifiers can be incorrectly propagated across sel.cond on
those platforms.
No tests were affected by this change, and I find that quite shocking.
After just changing flags_written(), all of the atan tests started
failing on ILK. That required the change in cmod_propagatin (and the
addition of the prop_across_into_sel_gfx5 unit test).
Shader-db results for ILK and GM45 are below. I looked at a couple
before and after shaders... and every case that I looked at had
experienced incorrect cmod propagation. This affected a LOT of apps!
Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2,
Gang Beasts, and on and on... :(
I discovered this bug while working on a couple new optimization
passes. One of the passes attempts to remove condition modifiers that
are never used. The pass made no progress except on ILK and GM45.
After investigating a couple of the affected shaders, I noticed that
the code in those shaders looked wrong... investigation led to this
cause.
v2: Trivial changes in the unit tests.
v3: Fix type in comment in unit tests. Noticed by Jason and Priit.
v4: Tweak handling of BRW_OPCODE_SEL special case. Suggested by Jason.
Fixes: df1aec763e ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dave Airlie <airlied@redhat.com>
Iron Lake
total instructions in shared programs: 8180493 -> 8181781 (0.02%)
instructions in affected programs: 541796 -> 543084 (0.24%)
helped: 28
HURT: 1158
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50%
HURT stats (abs) min: 1 max: 3 x̄: 1.14 x̃: 1
HURT stats (rel) min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23%
95% mean confidence interval for instructions value: 1.06 1.11
95% mean confidence interval for instructions %-change: 0.31% 0.38%
Instructions are HURT.
total cycles in shared programs: 239420470 -> 239421690 (<.01%)
cycles in affected programs: 2925992 -> 2927212 (0.04%)
helped: 49
HURT: 157
helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70
helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96%
HURT stats (abs) min: 2 max: 48 x̄: 27.34 x̃: 24
HURT stats (rel) min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20%
95% mean confidence interval for cycles value: -0.80 12.64
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).
GM45
total instructions in shared programs: 4985517 -> 4986207 (0.01%)
instructions in affected programs: 306935 -> 307625 (0.22%)
helped: 14
HURT: 625
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49%
HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel) min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.29% 0.36%
Instructions are HURT.
total cycles in shared programs: 153827268 -> 153828052 (<.01%)
cycles in affected programs: 1669290 -> 1670074 (0.05%)
helped: 24
HURT: 84
helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67
helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94%
HURT stats (abs) min: 2 max: 48 x̄: 27.71 x̃: 24
HURT stats (rel) min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -1.94 16.46
95% mean confidence interval for cycles %-change: -0.29% 0.11%
Inconclusive result (value mean confidence interval includes 0).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>