We weren't allowing immediates in BFE at all. Gfx12+ supports
immediates in src0 (value) and src2 (width), but not src1 (offset).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31437>
Xe2 increased the register size from 256-bits to 512-bits. So we can
store 32 16-bit values in a register, rather than 16 values. Prior to
this patch, we hadn't updated the pass, so the second half of each of
our registers was unused.
Backport-to: 24.2
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
Doxygen documentation says
> If the file name is omitted (i.e. the line after \file is left
> blank) then the documentation block that contains the \file command will
> belong to the file it is located in.
so we can omit the filename itself when using the annotation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30168>
No shader-db or fossil-db changes on any Intel platform. This ends up
begin helpful in "intel/brw: Use range analysis to optimize fsign."
v2: Add integer CSEL support
v3: Massive simplification (-20 lines!) of constant propagation
logic. Suggested by Ken. Add missing CSEL case in supports_src_as_imm.
Noticed by Ken.
v4: While MAD can mix F and HF sources on some platforms, CSEL
cannot. Found by skqp on TGL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
Both of these helpers do the same thing. We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Icelake removed the PLN instruction for interpolating fragment shader
inputs, instead adding a special "Native Float" (NF) data type which
was a 66-bit floating point data type that could only be used with the
accumulator. On Tigerlake, they dropped NF support in favor of just
doing the interpolation with MAD instructions.
We stopped using NF years ago (commit 9ea90aae1e),
instead just using the fs_visitor::lower_linterp() pass to emit MADs.
Since this existed only for a short time, and had very limited utility,
we drop it from the compiler. One downside is that we can no longer
disassemble Icelake shaders containing NF types properly, but I doubt
anyone really minds.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
This is achieved by the following steps:
#ifndef DEBUG => #if !MESA_DEBUG
defined(DEBUG) => MESA_DEBUG
#ifdef DEBUG => #if MESA_DEBUG
This is done by replace in vscode
excludes
docs,*.rs,addrlib,src/imgui,*.sh,src/intel/vulkan/grl/gpu
These are safe because those files should keep DEBUG macro is already excluded;
and not directly replace DEBUG, as we have some symbols around it.
Use debug or NDEBUG instead of DEBUG in comments when proper
This for reduce the usage of DEBUG,
so it's easier migrating to MESA_DEBUG
These are found when migrating DEBUG to MESA_DEBUG,
these are all comment update, so it's safe
Replace comment /* DEBUG */ and /* !DEBUG */ with proper /* MESA_DEBUG */ or /* !MESA_DEBUG */ manually
DEBUG || !NDEBUG -> MESA_DEBUG || !NDEBUG
!DEBUG && NDEBUG -> !(MESA_DEBUG || !NDEBUG)
Replace the DEBUG present in comment with proper new MESA_DEBUG manually
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>
I tried this when I was working on MR !7698, and it didn't have much
affect back then. Maybe I've added more stuff to my fossil-db?
Gfx12 platforms (Tiger Lake and DG2) are unaffected because the POW
instruction was removed.
shader-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20301933 -> 20301900 (<.01%)
instructions in affected programs: 9077 -> 9044 (-0.36%)
helped: 33 / HURT: 0
total cycles in shared programs: 842797624 -> 842799471 (<.01%)
cycles in affected programs: 1361911 -> 1363758 (0.14%)
helped: 35 / HURT: 111
LOST: 0
GAINED: 9
fossil-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 165510222 -> 165510163 (-0.00%)
Cycles: 15125195835 -> 15125194484 (-0.00%); split: -0.00%, +0.00%
Spill count: 45204 -> 45196 (-0.02%)
Fill count: 74157 -> 74149 (-0.01%)
Totals from 65 (0.01% of 656118) affected shaders:
Instrs: 57426 -> 57367 (-0.10%)
Cycles: 1667918 -> 1666567 (-0.08%); split: -0.11%, +0.03%
Spill count: 137 -> 129 (-5.84%)
Fill count: 515 -> 507 (-1.55%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
The majority of cases that would have been affected by this actually
had both sources as integer constants. The earlier commit "intel/rt:
Don't directly generate umul_32x16" allowed those to be constant
folded.
v2: Move the a*-1 block to be near the existing a*-1 block.
No shader-db changes on any Intel platform.
fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165510246 -> 165510222 (-0.00%)
Cycles: 15125198238 -> 15125195835 (-0.00%); split: -0.00%, +0.00%
Totals from 46 (0.01% of 656118) affected shaders:
Instrs: 36010 -> 35986 (-0.07%)
Cycles: 2613658 -> 2611255 (-0.09%); split: -0.17%, +0.07%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
When building the CFG the instructions are taken of the list in
fs_visitor and added to the lists inside each block. The single
"exec_node" in the instruction is used for those memberships.
In the case the pass rebuilt the CFG, it had no instructions, so
calculate_cfg() had nothing to work with. For now fix the bug by
pulling all the instructions back to the original list.
We can do better here, but punting until upcoming work on
CFG itself.
Issue found in an unpublished CTS test. Small reproduction in our
unit tests now enabled.
Fixes: 65237f8bbc ("intel/fs: Don't add MOV instructions to DO blocks in combine constants")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27131>
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
The remaining users can simply create a new builder at_end() if needed.
In many places a new builder object is already being constructed, so
just give more specific instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
There was a subtle bug related to CFG tracking. Namely, some branch
instructions may point *only* to the block after the DO instruction
for the loop. If the MOV instructions are in the DO block, the may not
have liveness properly tracked.
Like in !25132, having the MOV instructions in blocks that might
contain other instructions helps scheduling.
shader-db:
All Broadwell and newer Intel GPUs had similar results (Ice Lake shown)
total cycles in shared programs: 848577248 -> 848557268 (<.01%)
cycles in affected programs: 78256396 -> 78236416 (-0.03%)
helped: 361 / HURT: 18
fossil-db:
All Skylake and newer Intel GPUs had similar results (Ice Lake shown)
Totals:
Cycles: 15021501924 -> 15021372904 (-0.00%); split: -0.00%, +0.00%
Totals from 735 (0.11% of 656080) affected shaders:
Cycles: 676429502 -> 676300482 (-0.02%); split: -0.02%, +0.00%
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26439>
Each block is processed separately. VGRF channels that are allocated to
values that are only used in a particular block are made available in
other blocks.
This is almost always an improvement, but there are some pessimal cases
where it goes horribly wrong. Imagine a shader with two blocks. In
that shader, the first block has 5 constants used in the first block and
the second block. Three other constants are only used in the first
block. The second block has 15 constants that are used only in the
block. The static VGRF usage is 3 regardless of packing. However,
scheduling may be able to shorten the live range of the first VGRF when
it only has values that came from the first block (because three of the
values are dead on entry to the second block).
This used to occurs in a Mad Max shader on Broadwell. That shader
went from 0:0 spills:fills to 107:52. Some changes over the last
year, I'm assuming !13734, have prevented this case from occuring.
This change created a lot of churn on Haswell and Ivy Bridge. This
seems to be primarily due to all the extra constants used for coissue,
but I did not investigate very deeply. On older platforms, there were
no changes to spills or fills. As a result, this is only used on
Broadwell and newer platforms.
v2: Update expected checksum for pixmark-piano-v2.trace on
gl-zink-anv-tgl. See #9714 for more details.
shader-db results:
Tiger Lake
total instructions in shared programs: 21101332 -> 21102084 (<.01%)
instructions in affected programs: 863686 -> 864438 (0.09%)
helped: 463 / HURT: 437
total cycles in shared programs: 790573225 -> 790664391 (0.01%)
cycles in affected programs: 92546803 -> 92637969 (0.10%)
helped: 558 / HURT: 629
total spills in shared programs: 3959 -> 3951 (-0.20%)
spills in affected programs: 184 -> 176 (-4.35%)
helped: 2 / HURT: 0
total fills in shared programs: 2639 -> 2631 (-0.30%)
fills in affected programs: 184 -> 176 (-4.35%)
helped: 2 / HURT: 0
LOST: 1
GAINED: 5
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19945216 -> 19944711 (<.01%)
instructions in affected programs: 139569 -> 139064 (-0.36%)
helped: 66 / HURT: 3
total cycles in shared programs: 858410082 -> 857381323 (-0.12%)
cycles in affected programs: 383825958 -> 382797199 (-0.27%)
helped: 1012 / HURT: 1055
total spills in shared programs: 6190 -> 6116 (-1.20%)
spills in affected programs: 891 -> 817 (-8.31%)
helped: 66 / HURT: 3
total fills in shared programs: 7382 -> 7238 (-1.95%)
fills in affected programs: 1538 -> 1394 (-9.36%)
helped: 66 / HURT: 3
LOST: 5
GAINED: 8
Broadwell
total instructions in shared programs: 17820886 -> 17812515 (-0.05%)
instructions in affected programs: 800512 -> 792141 (-1.05%)
helped: 385 / HURT: 1
total cycles in shared programs: 904482935 -> 903102070 (-0.15%)
cycles in affected programs: 422427015 -> 421046150 (-0.33%)
helped: 1091 / HURT: 812
total spills in shared programs: 17908 -> 16576 (-7.44%)
spills in affected programs: 9459 -> 8127 (-14.08%)
helped: 386 / HURT: 0
total fills in shared programs: 25397 -> 22354 (-11.98%)
fills in affected programs: 15504 -> 12461 (-19.63%)
helped: 385 / HURT: 1
LOST: 2
GAINED: 2
No shader-db changes on Haswell or older platforms.
fossil-db results:
Tiger Lake
Instructions in all programs: 156881463 -> 156890970 (+0.0%)
Instructions helped: 9033
Instructions hurt: 10285
Cycles in all programs: 7532597466 -> 7529647924 (-0.0%)
Cycles helped: 10548
Cycles hurt: 13667
Spills in all programs: 5490 -> 5110 (-6.9%)
Spills helped: 100
Spills hurt: 3
Fills in all programs: 6123 -> 5752 (-6.1%)
Fills helped: 100
Fills hurt: 3
Gained: 17
Lost: 47
Ice Lake
Instructions in all programs: 141309644 -> 141309603 (-0.0%)
Instructions helped: 9
Instructions hurt: 4
Cycles in all programs: 9095812690 -> 9097008049 (+0.0%)
Cycles helped: 14288
Cycles hurt: 16381
Spills in all programs: 7418 -> 7404 (-0.2%)
Spills helped: 9
Spills hurt: 4
Fills in all programs: 8326 -> 8321 (-0.1%)
Fills helped: 9
Fills hurt: 4
Skylake
Instructions in all programs: 131872347 -> 131870690 (-0.0%)
Instructions helped: 111
Instructions hurt: 3
Cycles in all programs: 8800835649 -> 8802483884 (+0.0%)
Cycles helped: 9415
Cycles hurt: 9678
Spills in all programs: 6917 -> 6476 (-6.4%)
Spills helped: 111
Spills hurt: 3
Fills in all programs: 7584 -> 7354 (-3.0%)
Fills helped: 111
Fills hurt: 3
Lost: 5
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
It is very common to have bcsel where the second and third sources are
both constants. This results in a situation where we would want to emit
a SEL with two constant sources, but that's not allowed.
Previously, we would load both constants into registers, then let
constant propagation copy the last constant into the SEL instruction.
This results in the constant using an entire SIMD register instead of a
single channel.
Instead, copy propagate both sources, then let the combine-constants
pass do its thing. In the worst case, this stores the constant in a
single channel of the SIMD register. In the best case, it reuses a
value that was loaded into a register to satisfy another instruction.
shader-db results:
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951549 -> 19948709 (-0.01%)
instructions in affected programs: 482795 -> 479955 (-0.59%)
helped: 1184 / HURT: 3
total cycles in shared programs: 858584724 -> 858205341 (-0.04%)
cycles in affected programs: 356168375 -> 355788992 (-0.11%)
helped: 1448 / HURT: 1195
total spills in shared programs: 6569 -> 6255 (-4.78%)
spills in affected programs: 912 -> 598 (-34.43%)
helped: 58 / HURT: 0
total fills in shared programs: 8218 -> 7813 (-4.93%)
fills in affected programs: 1570 -> 1165 (-25.80%)
helped: 58 / HURT: 0
LOST: 6
GAINED: 16
Broadwell
total instructions in shared programs: 17819660 -> 17819389 (<.01%)
instructions in affected programs: 1078129 -> 1077858 (-0.03%)
helped: 1067 / HURT: 304
total cycles in shared programs: 904722624 -> 905035016 (0.03%)
cycles in affected programs: 362583117 -> 362895509 (0.09%)
helped: 1381 / HURT: 1123
total spills in shared programs: 17884 -> 17922 (0.21%)
spills in affected programs: 5088 -> 5126 (0.75%)
helped: 55 / HURT: 152
total fills in shared programs: 25533 -> 26290 (2.96%)
fills in affected programs: 12992 -> 13749 (5.83%)
helped: 61 /HURT: 295
LOST: 7
GAINED: 24
Haswell
total instructions in shared programs: 16678080 -> 16673976 (-0.02%)
instructions in affected programs: 1162893 -> 1158789 (-0.35%)
helped: 1584 / HURT: 7
total cycles in shared programs: 880180082 -> 879932525 (-0.03%)
cycles in affected programs: 364067522 -> 363819965 (-0.07%)
helped: 1226 / HURT: 976
total spills in shared programs: 14937 -> 14428 (-3.41%)
spills in affected programs: 7866 -> 7357 (-6.47%)
helped: 351 / HURT: 5
total fills in shared programs: 17572 -> 16975 (-3.40%)
fills in affected programs: 11028 -> 10431 (-5.41%)
helped: 350 / HURT: 3
LOST: 8
GAINED: 16
Ivy Bridge
total instructions in shared programs: 15704044 -> 15703158 (<.01%)
instructions in affected programs: 304513 -> 303627 (-0.29%)
helped: 707 / HURT: 0
total cycles in shared programs: 433560149 -> 433471118 (-0.02%)
cycles in affected programs: 19299650 -> 19210619 (-0.46%)
helped: 687 / HURT: 395
LOST: 2
GAINED: 9
Sandy Bridge
total instructions in shared programs: 13913386 -> 13912884 (<.01%)
instructions in affected programs: 195687 -> 195185 (-0.26%)
helped: 455 / HURT: 0
total cycles in shared programs: 741156272 -> 741136266 (<.01%)
cycles in affected programs: 10934349 -> 10914343 (-0.18%)
helped: 578 / HURT: 289
LOST: 9
GAINED: 4
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8364056 -> 8364042 (<.01%)
instructions in affected programs: 5178 -> 5164 (-0.27%)
helped: 10 / HURT: 0
total cycles in shared programs: 248759794 -> 248757940 (<.01%)
cycles in affected programs: 4305246 -> 4303392 (-0.04%)
helped: 183 / HURT: 24
fossil-db results:
Tiger Lake
Instructions in all programs: 156943594 -> 156802601 (-0.1%)
Instructions helped: 20595
Instructions hurt: 23248
Cycles in all programs: 7512086950 -> 7528386387 (+0.2%)
Cycles helped: 29531
Cycles hurt: 27837
Spills in all programs: 13500 -> 5643 (-58.2%)
Spills helped: 394
Spills hurt: 22
Fills in all programs: 18943 -> 6306 (-66.7%)
Fills helped: 394
Fills hurt: 11
Gained: 93
Lost: 76
Ice Lake
Instructions in all programs: 141395899 -> 141249621 (-0.1%)
Instructions helped: 30067
Instructions hurt: 3
Cycles in all programs: 9097127057 -> 9089668235 (-0.1%)
Cycles helped: 32268
Cycles hurt: 24315
Spills in all programs: 13695 -> 7564 (-44.8%)
Spills helped: 403
Fills in all programs: 18400 -> 8494 (-53.8%)
Fills helped: 403
Gained: 114
Lost: 137
Skylake
Instructions in all programs: 131948328 -> 131826063 (-0.1%)
Instructions helped: 29968
Instructions hurt: 3
Cycles in all programs: 8794778440 -> 8793934844 (-0.0%)
Cycles helped: 32705
Cycles hurt: 23575
Spills in all programs: 10526 -> 7039 (-33.1%)
Spills helped: 403
Fills in all programs: 11025 -> 7728 (-29.9%)
Fills helped: 403
Gained: 102
Lost: 250
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
The is a squash of what in the original MR was "util: Add generic pass
that tries to combine constants" and "intel/fs: Switch to using
util_combine_constants".
The new algorithm uses a multi-pass greedy algorithm that attempts to
collect constants for loading in order of increasing degrees of freedom.
The first pass collects constants that must be emitted as-is (e.g.,
without source modifiers).
The second pass emits all constants that must be emitted (because they
are used in a source field that cannot be a literal constant) but that
can have a source modifier.
The final pass possibly emits constants that may not have to be emitted.
This is used for instructions where one of the fields is allowed to be a
constant. This is not used in the current commit, but future commits
that enable SEL will use this. The SEL instruction can have a single
constant, but when both sources are constant, one of the sources has to
be loaded into a register.
By loading constants in this order, required "choices" made in earlier
passes may be re-used in later passes. This provides a more optimal
result.
At this point in the series, most platforms have the same results with
the new implementation. Gen7 platforms see a significant number of
"small" changes. Due to the coissue optimization on Gen7, each shader
is likely to have most constants affected by constant combining.
If a shader has only a single basic block, constants are packed into
registers in the order produced by the constant combining process.
Since each constant has a different live range in the shader, even
slightly different packing orders can have dramatic effects on the live
range of a register. Even in cases where this does not affect register
pressure in a meaningful way, it can cause the scheduler to make very
different choices about the ordering of instructions.
From my analysis (using the `if (debug) { ... }` block at the end of
fs_visitor::opt_combine_constants), the old implementation and the new
implementation pick the same set of constants, but the order produced
may be slightly different. For the smaller number of values in non-Gfx7
shaders, the orders are similar enough to not matter.
No shader-db or fossil-db changes on any non-Gfx7 platforms.
Haswell and Ivy Bridge had similar results. (Haswell shown)
total cycles in shared programs: 879930036 -> 880001666 (<.01%)
cycles in affected programs: 22485040 -> 22556670 (0.32%)
helped: 1879
HURT: 2309
helped stats (abs) min: 1 max: 6296 x̄: 258.54 x̃: 34
helped stats (rel) min: <.01% max: 54.63% x̄: 3.88% x̃: 0.87%
HURT stats (abs) min: 1 max: 9739 x̄: 241.41 x̃: 40
HURT stats (rel) min: <.01% max: 160.50% x̄: 6.01% x̃: 0.99%
95% mean confidence interval for cycles value: -1.04 35.25
95% mean confidence interval for cycles %-change: 1.23% 1.92%
Inconclusive result (value mean confidence interval includes 0).
LOST: 82
GAINED: 39
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>
Sources that are already W, UW, or HF can be represented as those types
by definition. Pass them through. Previously an HF source on a MAD would
have been marked as !can_promote. I'm pretty sure this means it would
get moved out to a register, but I did not verify this.
For ADD3, a constant source could be D or UD. In this case, the value
must be tested to determine whether it can be represented as W or
UW. The patterns in opt_algebraic won't generate an ADD3 with constant
source, so this problem cannot occur yet.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
This is a bit more wordy, but it will greatly simplify some future
changes.
v2: Rebase on ADD3 changes.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
It's a lot more useful to have it in the same stream with the
INTEL_DEBUG=fs output.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
CoverityID: 1487496
Fixes: cde9ca616d "intel/compiler: Make decision based on source type instead of opcode"
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11985>
This patch restructure code a little bit to check if source can be
represented as immediate operand. This is a foundation for next patch
which add checks for integer operand as well.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
[ Francisco Jerez: Add TODO comment explaining why this is helpful and
how we could better fix it. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
This only does half of the work. The actual representation of the
idom tree is left untouched, but the computation algorithm is moved
into a separate analysis result class wrapped in a BRW_ANALYSIS
object, along with the intersect() and dump_domtree() auxiliary
functions in order to keep things tidy.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR. For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.
This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.
v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
warning: struct 'fs_reg' was previously declared as a class
Fixes: e64be391 ("intel/compiler: generalize the combine constants pass")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Found by GCC warning:
src/intel/compiler/brw_fs_combine_constants.cpp: In function ‘bool needs_negate(const fs_reg*, const imm*)’:
src/intel/compiler/brw_fs_combine_constants.cpp:306:34: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
return ((reg->d & 0xffffu) < 0) != (imm->w < 0);
~~~~~~~~~~~~~~~~~~~^~~
The result of the bit-and is a 32-bit value with the top bits all zero.
This will never be < 0. Instead of masking off the bits, just cast to
int16_t and let the compiler handle the actual conversion.
Fixes: e64be391dd ("intel/compiler: generalize the combine constants pass")
Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At the very least we need it to handle HF too, since we are doing
constant propagation for MAD and LRP, which relies on this pass
to promote the immediates to GRF in the end, but ideally
we want it to support even more types so we can take advantage
of it to improve register pressure in some scenarios.
v2 (Jason):
- Support 64-bit types too.
- Check if we need to set the half-float flag if the immediate already
existed.
- Multiply the size of the immediate by the width of the copy
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A follow on commit will move nr to the same union as the immediate
data, so we should assert these invariants before we overwrite the nr
field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Renamed from src/mesa/drivers/dri/i965/brw_fs_combine_constants.cpp (Browse further)