The larger predicate here already requires that inst->opcode must be
BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the
other simplifications are pretty straight forward.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
The caller already loops over the sources. This means that the caller
must loop over the sources in reverse because constant propagation
prefers to propagate into the last sources first.
The shader-db and fossil-db changes (below) are all due to SEL
instructions. Changing the order sources are visited changes whether a
SEL with two immediate sources is
(+f0.0) sel g12 IMM_A IMM_B
or
(-f0.0) sel g12 IMM_B IMM_A
The ordering of the sources affects the order the constant combining
encounters the values, and the determines which value is "combined"
and which value remains an immediate.
This affects the results by luck. If there are two instructions:
(+f0.0) sel g12 IMM_A IMM_B
(+f0.0) sel g13 IMM_A IMM_C
Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the
selection algorithm in constant combining is greedy, this case
requires the algorithm see the values in just the right order for the
right thing to happen.
v2: Rebase on many, many changes. Move instruction source fixup
reordering out or try_constant_propagate.
v3: Rebase on !7698.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
If the linked list structure used depended on the list head to know when
to terminate, this would be a pretty serious bug. If try_constant_propage
or try_copy_propagate make progress, inst->src[i].nr will change. This
results in the foreach_in_list using a different list header on later
iterations of the loop.
This causes two shaders in shader-db and 9 shaders in fossil-db to
change. Looking at the code changes, these are cases where there was a
copy of a copy that gets propagated. The part that confuses me is the
VGRF numbers involved should **not** hash to the same bucket, so it
should be impossible to find the original source from the intermediate
VGRF.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
Unless the change in liveout also causes livein to change, updates to
liveout cannot have any global effect. Changes to livein already flag
additional interation.
I had additional changes in this area that didn't pan out. While working
on those change, I was a little confused about this bit of code. It's
unnecessary, so it's better to delete it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
GLSL doesn't use that type. SPIR-V used for a while but later started
relying on its own data structures and stopped using it.
See ca62e849d3 ("nir/spirv: Stop using glsl_type for function types")
If we were ever to add this one again, would be better to have a way to
grab a key for lookup that did not require allocations, right now that's
needed to inject return type as the first element in params array.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25160>
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.
Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>
The compaction introduced in a252123363 ("intel/compiler/mesh: compactify MUE layout")
is not suitable for the case where graphics pipeline libraries are fast
linked, as the fragment shader won't receive the mue_map to know where
to locate its inputs.
For that case, keep doing what we did before and lay things down in the
order varyings are defined, which is also how it works for the non-mesh
case.
Fixes dEQP-VK.fragment_shading_rate.*fast_linked_library*.ms
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>
Flushing and invalidating caches isn't necessary for workgroup scope
fences. In fact, the DP_FLUSH_TYPE docs (BSpec 54041) say:
"If the fence scope is Local or Threadgroup, HW ignores the flush
type and operates as if it was set to None(no flush)"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>
nir_lower_tex can lower tg4 coords into tg4 offset which on DG2+ we
also need to lower into constant offsets.
Unfortunately the nir_lower_tex pass is not able to lower the
instructions it itself generates, so the easy fix for when
nir_lower_tex lowers tg4 coords into tg4 offsets is to rerun the pass.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9735
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25015>
With -Dintel-clc=system, the build system will search for an `intel_clc`
binary and use it instead of building `intel_clc` itself.
This allows Intel Vulkan ray tracing support to be built when cross
compiling without terrible hacks (that would otherwise be necessary due
to `intel_clc`'s dependence on SPIRV-LLVM-Translator, libclc, clang, and
LLVM).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24983>
We use a non-uniform lowering loop in the backend which we can do
better in NIR because we can also use divergence analysis there.
This change also limits VGRF usage to a single VGRF to hold the sample
ID in the backend.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>
There is no restriction for query per sample positions from the
interpolator when in non-per-sample dispatch mode. But apparently
that's not giving us the expected values for fragment shaders compiled
without per-sample dispatch knowledge (graphics pipeline libraries).
So when per-sample dispatch is dynamic and we're doing at_sample
interpolation, turn the interpolation back into at_offset at runtime
when we detect that the fragment shader is not run per sample.
Fixes a bunch of dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d8dfd153c5 ("intel/fs: Make per-sample and coarse dispatch tri-state")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>
Each block is processed separately. VGRF channels that are allocated to
values that are only used in a particular block are made available in
other blocks.
This is almost always an improvement, but there are some pessimal cases
where it goes horribly wrong. Imagine a shader with two blocks. In
that shader, the first block has 5 constants used in the first block and
the second block. Three other constants are only used in the first
block. The second block has 15 constants that are used only in the
block. The static VGRF usage is 3 regardless of packing. However,
scheduling may be able to shorten the live range of the first VGRF when
it only has values that came from the first block (because three of the
values are dead on entry to the second block).
This used to occurs in a Mad Max shader on Broadwell. That shader
went from 0:0 spills:fills to 107:52. Some changes over the last
year, I'm assuming !13734, have prevented this case from occuring.
This change created a lot of churn on Haswell and Ivy Bridge. This
seems to be primarily due to all the extra constants used for coissue,
but I did not investigate very deeply. On older platforms, there were
no changes to spills or fills. As a result, this is only used on
Broadwell and newer platforms.
v2: Update expected checksum for pixmark-piano-v2.trace on
gl-zink-anv-tgl. See #9714 for more details.
shader-db results:
Tiger Lake
total instructions in shared programs: 21101332 -> 21102084 (<.01%)
instructions in affected programs: 863686 -> 864438 (0.09%)
helped: 463 / HURT: 437
total cycles in shared programs: 790573225 -> 790664391 (0.01%)
cycles in affected programs: 92546803 -> 92637969 (0.10%)
helped: 558 / HURT: 629
total spills in shared programs: 3959 -> 3951 (-0.20%)
spills in affected programs: 184 -> 176 (-4.35%)
helped: 2 / HURT: 0
total fills in shared programs: 2639 -> 2631 (-0.30%)
fills in affected programs: 184 -> 176 (-4.35%)
helped: 2 / HURT: 0
LOST: 1
GAINED: 5
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19945216 -> 19944711 (<.01%)
instructions in affected programs: 139569 -> 139064 (-0.36%)
helped: 66 / HURT: 3
total cycles in shared programs: 858410082 -> 857381323 (-0.12%)
cycles in affected programs: 383825958 -> 382797199 (-0.27%)
helped: 1012 / HURT: 1055
total spills in shared programs: 6190 -> 6116 (-1.20%)
spills in affected programs: 891 -> 817 (-8.31%)
helped: 66 / HURT: 3
total fills in shared programs: 7382 -> 7238 (-1.95%)
fills in affected programs: 1538 -> 1394 (-9.36%)
helped: 66 / HURT: 3
LOST: 5
GAINED: 8
Broadwell
total instructions in shared programs: 17820886 -> 17812515 (-0.05%)
instructions in affected programs: 800512 -> 792141 (-1.05%)
helped: 385 / HURT: 1
total cycles in shared programs: 904482935 -> 903102070 (-0.15%)
cycles in affected programs: 422427015 -> 421046150 (-0.33%)
helped: 1091 / HURT: 812
total spills in shared programs: 17908 -> 16576 (-7.44%)
spills in affected programs: 9459 -> 8127 (-14.08%)
helped: 386 / HURT: 0
total fills in shared programs: 25397 -> 22354 (-11.98%)
fills in affected programs: 15504 -> 12461 (-19.63%)
helped: 385 / HURT: 1
LOST: 2
GAINED: 2
No shader-db changes on Haswell or older platforms.
fossil-db results:
Tiger Lake
Instructions in all programs: 156881463 -> 156890970 (+0.0%)
Instructions helped: 9033
Instructions hurt: 10285
Cycles in all programs: 7532597466 -> 7529647924 (-0.0%)
Cycles helped: 10548
Cycles hurt: 13667
Spills in all programs: 5490 -> 5110 (-6.9%)
Spills helped: 100
Spills hurt: 3
Fills in all programs: 6123 -> 5752 (-6.1%)
Fills helped: 100
Fills hurt: 3
Gained: 17
Lost: 47
Ice Lake
Instructions in all programs: 141309644 -> 141309603 (-0.0%)
Instructions helped: 9
Instructions hurt: 4
Cycles in all programs: 9095812690 -> 9097008049 (+0.0%)
Cycles helped: 14288
Cycles hurt: 16381
Spills in all programs: 7418 -> 7404 (-0.2%)
Spills helped: 9
Spills hurt: 4
Fills in all programs: 8326 -> 8321 (-0.1%)
Fills helped: 9
Fills hurt: 4
Skylake
Instructions in all programs: 131872347 -> 131870690 (-0.0%)
Instructions helped: 111
Instructions hurt: 3
Cycles in all programs: 8800835649 -> 8802483884 (+0.0%)
Cycles helped: 9415
Cycles hurt: 9678
Spills in all programs: 6917 -> 6476 (-6.4%)
Spills helped: 111
Spills hurt: 3
Fills in all programs: 7584 -> 7354 (-3.0%)
Fills helped: 111
Fills hurt: 3
Lost: 5
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
It is very common to have bcsel where the second and third sources are
both constants. This results in a situation where we would want to emit
a SEL with two constant sources, but that's not allowed.
Previously, we would load both constants into registers, then let
constant propagation copy the last constant into the SEL instruction.
This results in the constant using an entire SIMD register instead of a
single channel.
Instead, copy propagate both sources, then let the combine-constants
pass do its thing. In the worst case, this stores the constant in a
single channel of the SIMD register. In the best case, it reuses a
value that was loaded into a register to satisfy another instruction.
shader-db results:
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951549 -> 19948709 (-0.01%)
instructions in affected programs: 482795 -> 479955 (-0.59%)
helped: 1184 / HURT: 3
total cycles in shared programs: 858584724 -> 858205341 (-0.04%)
cycles in affected programs: 356168375 -> 355788992 (-0.11%)
helped: 1448 / HURT: 1195
total spills in shared programs: 6569 -> 6255 (-4.78%)
spills in affected programs: 912 -> 598 (-34.43%)
helped: 58 / HURT: 0
total fills in shared programs: 8218 -> 7813 (-4.93%)
fills in affected programs: 1570 -> 1165 (-25.80%)
helped: 58 / HURT: 0
LOST: 6
GAINED: 16
Broadwell
total instructions in shared programs: 17819660 -> 17819389 (<.01%)
instructions in affected programs: 1078129 -> 1077858 (-0.03%)
helped: 1067 / HURT: 304
total cycles in shared programs: 904722624 -> 905035016 (0.03%)
cycles in affected programs: 362583117 -> 362895509 (0.09%)
helped: 1381 / HURT: 1123
total spills in shared programs: 17884 -> 17922 (0.21%)
spills in affected programs: 5088 -> 5126 (0.75%)
helped: 55 / HURT: 152
total fills in shared programs: 25533 -> 26290 (2.96%)
fills in affected programs: 12992 -> 13749 (5.83%)
helped: 61 /HURT: 295
LOST: 7
GAINED: 24
Haswell
total instructions in shared programs: 16678080 -> 16673976 (-0.02%)
instructions in affected programs: 1162893 -> 1158789 (-0.35%)
helped: 1584 / HURT: 7
total cycles in shared programs: 880180082 -> 879932525 (-0.03%)
cycles in affected programs: 364067522 -> 363819965 (-0.07%)
helped: 1226 / HURT: 976
total spills in shared programs: 14937 -> 14428 (-3.41%)
spills in affected programs: 7866 -> 7357 (-6.47%)
helped: 351 / HURT: 5
total fills in shared programs: 17572 -> 16975 (-3.40%)
fills in affected programs: 11028 -> 10431 (-5.41%)
helped: 350 / HURT: 3
LOST: 8
GAINED: 16
Ivy Bridge
total instructions in shared programs: 15704044 -> 15703158 (<.01%)
instructions in affected programs: 304513 -> 303627 (-0.29%)
helped: 707 / HURT: 0
total cycles in shared programs: 433560149 -> 433471118 (-0.02%)
cycles in affected programs: 19299650 -> 19210619 (-0.46%)
helped: 687 / HURT: 395
LOST: 2
GAINED: 9
Sandy Bridge
total instructions in shared programs: 13913386 -> 13912884 (<.01%)
instructions in affected programs: 195687 -> 195185 (-0.26%)
helped: 455 / HURT: 0
total cycles in shared programs: 741156272 -> 741136266 (<.01%)
cycles in affected programs: 10934349 -> 10914343 (-0.18%)
helped: 578 / HURT: 289
LOST: 9
GAINED: 4
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8364056 -> 8364042 (<.01%)
instructions in affected programs: 5178 -> 5164 (-0.27%)
helped: 10 / HURT: 0
total cycles in shared programs: 248759794 -> 248757940 (<.01%)
cycles in affected programs: 4305246 -> 4303392 (-0.04%)
helped: 183 / HURT: 24
fossil-db results:
Tiger Lake
Instructions in all programs: 156943594 -> 156802601 (-0.1%)
Instructions helped: 20595
Instructions hurt: 23248
Cycles in all programs: 7512086950 -> 7528386387 (+0.2%)
Cycles helped: 29531
Cycles hurt: 27837
Spills in all programs: 13500 -> 5643 (-58.2%)
Spills helped: 394
Spills hurt: 22
Fills in all programs: 18943 -> 6306 (-66.7%)
Fills helped: 394
Fills hurt: 11
Gained: 93
Lost: 76
Ice Lake
Instructions in all programs: 141395899 -> 141249621 (-0.1%)
Instructions helped: 30067
Instructions hurt: 3
Cycles in all programs: 9097127057 -> 9089668235 (-0.1%)
Cycles helped: 32268
Cycles hurt: 24315
Spills in all programs: 13695 -> 7564 (-44.8%)
Spills helped: 403
Fills in all programs: 18400 -> 8494 (-53.8%)
Fills helped: 403
Gained: 114
Lost: 137
Skylake
Instructions in all programs: 131948328 -> 131826063 (-0.1%)
Instructions helped: 29968
Instructions hurt: 3
Cycles in all programs: 8794778440 -> 8793934844 (-0.0%)
Cycles helped: 32705
Cycles hurt: 23575
Spills in all programs: 10526 -> 7039 (-33.1%)
Spills helped: 403
Fills in all programs: 11025 -> 7728 (-29.9%)
Fills helped: 403
Gained: 102
Lost: 250
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
The is a squash of what in the original MR was "util: Add generic pass
that tries to combine constants" and "intel/fs: Switch to using
util_combine_constants".
The new algorithm uses a multi-pass greedy algorithm that attempts to
collect constants for loading in order of increasing degrees of freedom.
The first pass collects constants that must be emitted as-is (e.g.,
without source modifiers).
The second pass emits all constants that must be emitted (because they
are used in a source field that cannot be a literal constant) but that
can have a source modifier.
The final pass possibly emits constants that may not have to be emitted.
This is used for instructions where one of the fields is allowed to be a
constant. This is not used in the current commit, but future commits
that enable SEL will use this. The SEL instruction can have a single
constant, but when both sources are constant, one of the sources has to
be loaded into a register.
By loading constants in this order, required "choices" made in earlier
passes may be re-used in later passes. This provides a more optimal
result.
At this point in the series, most platforms have the same results with
the new implementation. Gen7 platforms see a significant number of
"small" changes. Due to the coissue optimization on Gen7, each shader
is likely to have most constants affected by constant combining.
If a shader has only a single basic block, constants are packed into
registers in the order produced by the constant combining process.
Since each constant has a different live range in the shader, even
slightly different packing orders can have dramatic effects on the live
range of a register. Even in cases where this does not affect register
pressure in a meaningful way, it can cause the scheduler to make very
different choices about the ordering of instructions.
From my analysis (using the `if (debug) { ... }` block at the end of
fs_visitor::opt_combine_constants), the old implementation and the new
implementation pick the same set of constants, but the order produced
may be slightly different. For the smaller number of values in non-Gfx7
shaders, the orders are similar enough to not matter.
No shader-db or fossil-db changes on any non-Gfx7 platforms.
Haswell and Ivy Bridge had similar results. (Haswell shown)
total cycles in shared programs: 879930036 -> 880001666 (<.01%)
cycles in affected programs: 22485040 -> 22556670 (0.32%)
helped: 1879
HURT: 2309
helped stats (abs) min: 1 max: 6296 x̄: 258.54 x̃: 34
helped stats (rel) min: <.01% max: 54.63% x̄: 3.88% x̃: 0.87%
HURT stats (abs) min: 1 max: 9739 x̄: 241.41 x̃: 40
HURT stats (rel) min: <.01% max: 160.50% x̄: 6.01% x̃: 0.99%
95% mean confidence interval for cycles value: -1.04 35.25
95% mean confidence interval for cycles %-change: 1.23% 1.92%
Inconclusive result (value mean confidence interval includes 0).
LOST: 82
GAINED: 39
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
We try various pre-RA scheduler modes and see if any of them allow
us to register allocate without spilling. If all of them spill,
however, we left it on the last mode: LIFO. This is unfortunately
sometimes significantly worse than other modes (such as "none").
This patch makes us instead select the pre-RA scheduling mode that
gives the lowest register pressure estimate, if none of them manage
to avoid spilling. The hope is that this scheduling will spill the
least out of all of them.
fossil-db stats (on Alchemist) speak for themselves:
Totals:
Instrs: 197297092 -> 195326552 (-1.00%); split: -1.02%, +0.03%
Cycles: 14291286956 -> 14303502596 (+0.09%); split: -0.55%, +0.64%
Spill count: 190886 -> 129204 (-32.31%); split: -33.01%, +0.70%
Fill count: 361408 -> 225038 (-37.73%); split: -39.17%, +1.43%
Scratch Memory Size: 12935168 -> 10868736 (-15.98%); split: -16.08%, +0.10%
Totals from 1791 (0.27% of 668386) affected shaders:
Instrs: 7628929 -> 5658389 (-25.83%); split: -26.50%, +0.67%
Cycles: 719326691 -> 731542331 (+1.70%); split: -10.95%, +12.65%
Spill count: 110627 -> 48945 (-55.76%); split: -56.96%, +1.20%
Fill count: 221560 -> 85190 (-61.55%); split: -63.89%, +2.34%
Scratch Memory Size: 4471808 -> 2405376 (-46.21%); split: -46.51%, +0.30%
Improves performance when using XeSS in Cyberpunk 2077 by 90% on A770.
Improves performance of Borderlands 3 by 1.54% on A770.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
This moves a bit of code out of a large function, but also lets us reuse
it a few extra places in the next commit.
I opted to stop using ralloc here since this is short-lived data that
doesn't need to stick around for the rest of the compile, and it's easy
enough to free.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
pre_modes[] is an array with the modes ordered in our desired
preference. scheduler_mode_name[] was also in that order, and the two
had to be kept in sync. This is a little silly; we should just have
a mode enum -> string table and look it up via the enum.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
I'm going to introduce another call site for this function, and just
handling SCHEDULE_NONE in the scheduler itself makes more sense than
duplicating the logic.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
The register pressure analysis I wrote in 2013 only considered VGRFs,
and not other GRFs, such as payload registers and push constants. We
need to consider those too, because payload registers definitely occupy
space and add to pressure.
In 2015, Connor already made the scheduler account for this, so the only
real use for this is in shader statistic dumps and optimizer printouts.
But we should make it more accurate. (We will use it in more places
shortly, a few commits from now.)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
If the NIR_DEBUG_PRINT_INTERNAL flag is not set, don't print debugging
information for internal shaders in INTEL_DEBUG=optimizer dumps.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24684>
This reverts commit 6b494745be.
The logic is not entirely correct: the comparison is between two
static-analysis estimates of a dynamic system with variables that aren't
captured by the shader source, so using ">" will always have greater potential
to cause regressions whenever the performance difference between the two builds
is something not captured by the static model, no matter how much the model is
improved.
Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9262
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24615>
Generated mostly with sed:
sed -i -e 's/live_ssa_def/live_def/g' src/compiler/nir/nir.h src/compiler/nir/*.c
Plus three fixups in various Intel drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24703>