fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 20:00:10 +01:00

Author	SHA1	Message	Date
Ian Romanick	44d62a5224	intel/fs: Completely re-write the combine constants pass The is a squash of what in the original MR was "util: Add generic pass that tries to combine constants" and "intel/fs: Switch to using util_combine_constants". The new algorithm uses a multi-pass greedy algorithm that attempts to collect constants for loading in order of increasing degrees of freedom. The first pass collects constants that must be emitted as-is (e.g., without source modifiers). The second pass emits all constants that must be emitted (because they are used in a source field that cannot be a literal constant) but that can have a source modifier. The final pass possibly emits constants that may not have to be emitted. This is used for instructions where one of the fields is allowed to be a constant. This is not used in the current commit, but future commits that enable SEL will use this. The SEL instruction can have a single constant, but when both sources are constant, one of the sources has to be loaded into a register. By loading constants in this order, required "choices" made in earlier passes may be re-used in later passes. This provides a more optimal result. At this point in the series, most platforms have the same results with the new implementation. Gen7 platforms see a significant number of "small" changes. Due to the coissue optimization on Gen7, each shader is likely to have most constants affected by constant combining. If a shader has only a single basic block, constants are packed into registers in the order produced by the constant combining process. Since each constant has a different live range in the shader, even slightly different packing orders can have dramatic effects on the live range of a register. Even in cases where this does not affect register pressure in a meaningful way, it can cause the scheduler to make very different choices about the ordering of instructions. From my analysis (using the `if (debug) { ... }` block at the end of fs_visitor::opt_combine_constants), the old implementation and the new implementation pick the same set of constants, but the order produced may be slightly different. For the smaller number of values in non-Gfx7 shaders, the orders are similar enough to not matter. No shader-db or fossil-db changes on any non-Gfx7 platforms. Haswell and Ivy Bridge had similar results. (Haswell shown) total cycles in shared programs: 879930036 -> 880001666 (<.01%) cycles in affected programs: 22485040 -> 22556670 (0.32%) helped: 1879 HURT: 2309 helped stats (abs) min: 1 max: 6296 x̄: 258.54 x̃: 34 helped stats (rel) min: <.01% max: 54.63% x̄: 3.88% x̃: 0.87% HURT stats (abs) min: 1 max: 9739 x̄: 241.41 x̃: 40 HURT stats (rel) min: <.01% max: 160.50% x̄: 6.01% x̃: 0.99% 95% mean confidence interval for cycles value: -1.04 35.25 95% mean confidence interval for cycles %-change: 1.23% 1.92% Inconclusive result (value mean confidence interval includes 0). LOST: 82 GAINED: 39 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>	2023-08-29 19:01:36 +00:00
Ian Romanick	9a9a86013c	intel/fs: Allow HF const in MAD on Gfx12.5 if all sources are HF Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>	2023-06-06 06:10:53 +00:00
Ian Romanick	4f272bf001	intel/fs: Fix handling of W, UW, and HF constants in combine_constants Sources that are already W, UW, or HF can be represented as those types by definition. Pass them through. Previously an HF source on a MAD would have been marked as !can_promote. I'm pretty sure this means it would get moved out to a register, but I did not verify this. For ADD3, a constant source could be D or UD. In this case, the value must be tested to determine whether it can be represented as W or UW. The patterns in opt_algebraic won't generate an ADD3 with constant source, so this problem cannot occur yet. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>	2023-06-06 06:10:53 +00:00
Ian Romanick	2016d9f46c	intel/fs: Rework the loop of opt_combine_constants that collects constants This is a bit more wordy, but it will greatly simplify some future changes. v2: Rebase on ADD3 changes. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Ian Romanick	9e4bb4bfcf	intel/fs: Refactor part of opt_combine_constants to a separate function Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Ian Romanick	593cde0432	intel/fs: Output opt_combine_constants debug to stderr It's a lot more useful to have it in the same stream with the INTEL_DEBUG=fs output. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>	2023-04-03 21:50:06 +00:00
Sagar Ghuge	0608e76e00	intel/compiler: Fix missing break in switch CoverityID: 1487496 Fixes: `cde9ca616d` "intel/compiler: Make decision based on source type instead of opcode" Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11985>	2021-07-22 23:38:04 +00:00
Sagar Ghuge	e6db2299a8	intel/compiler: Allow ternary add to promote source to immediate Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>	2021-07-16 15:59:56 +00:00
Sagar Ghuge	cde9ca616d	intel/compiler: Make decision based on source type instead of opcode This patch restructure code a little bit to check if source can be represented as immediate operand. This is a foundation for next patch which add checks for integer operand as well. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>	2021-07-16 15:59:56 +00:00
Anuj Phogat	61e8636557	intel: Rename gen_device prefix to intel_device export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen_device" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen_device/intel_device/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Jordan Justen	262cb08557	intel/fs: Disable 3-src immediates on XeHP. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> [ Francisco Jerez: Add TODO comment explaining why this is helpful and how we could better fix it. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>	2021-04-16 08:27:35 +00:00
Anuj Phogat	abe9a71a09	intel: Rename gen field in gen_device_info struct to ver Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "info\)(.\|->)gen" -rIl $SEARCH_PATH \| xargs sed -ie "s/info$)$$\.\\|->$gen/info\1\2ver/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Francisco Jerez	c2a7eababf	intel/compiler: Move idom tree calculation and related logic into analysis object This only does half of the work. The actual representation of the idom tree is left untouched, but the computation algorithm is moved into a separate analysis result class wrapped in a BRW_ANALYSIS object, along with the intersect() and dump_domtree() auxiliary functions in order to keep things tidy. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:21:03 -08:00
Francisco Jerez	ab6d792986	intel/compiler: Pass detailed dependency classes to invalidate_analysis() Have fun reading through the whole back-end optimizer to verify whether I've missed any dependency flags -- Or alternatively, just trust that any mistake here will trigger an assertion failure during analysis pass validation if it ever poses a problem for the consistency of any of the analysis passes managed by the framework. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:39 -08:00
Francisco Jerez	d966a6b4c4	intel/compiler: Introduce backend_shader method to propagate IR changes to analysis passes The invalidate_analysis() method knows what analysis passes there are in the back-end and calls their invalidate() method to report changes in the IR. For the moment it just calls invalidate_live_intervals() (which will eventually be fully replaced by this function) if anything changed. This makes all optimization passes invalidate DEPENDENCY_EVERYTHING, which is clearly far from ideal -- The dependency classes passed to invalidate_analysis() will be refined in a future commit. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:32 -08:00
Ian Romanick	59488cbbac	intel/fs: Don't count integer instructions as being possibly coissue Integer instructions don't coissue. Before `e64be391dd` ("intel/compiler: generalize the combine constants pass"), this pass only looked at float sources. There's no shader-db data in that commit, so I collected some. The results are not good: Haswell total instructions in shared programs: 11898805 -> 11908127 (0.08%) instructions in affected programs: 1218680 -> 1228002 (0.76%) helped: 2 HURT: 5171 helped stats (abs) min: 12 max: 111 x̄: 61.50 x̃: 61 helped stats (rel) min: 1.59% max: 9.20% x̄: 5.40% x̃: 5.40% HURT stats (abs) min: 1 max: 311 x̄: 1.83 x̃: 1 HURT stats (rel) min: 0.02% max: 9.91% x̄: 1.05% x̃: 0.70% 95% mean confidence interval for instructions value: 1.55 2.05 95% mean confidence interval for instructions %-change: 1.02% 1.08% Instructions are HURT. total cycles in shared programs: 221664974 -> 221404750 (-0.12%) cycles in affected programs: 120012620 -> 119752396 (-0.22%) helped: 3464 HURT: 3159 helped stats (abs) min: 1 max: 428160 x̄: 314.55 x̃: 16 helped stats (rel) min: <.01% max: 57.33% x̄: 3.40% x̃: 1.28% HURT stats (abs) min: 1 max: 87846 x̄: 262.54 x̃: 14 HURT stats (rel) min: <.01% max: 85.57% x̄: 3.01% x̃: 0.77% 95% mean confidence interval for cycles value: -224.23 145.65 95% mean confidence interval for cycles %-change: -0.50% -0.19% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 9804 -> 10047 (2.48%) spills in affected programs: 6869 -> 7112 (3.54%) helped: 2 HURT: 41 total fills in shared programs: 19863 -> 20319 (2.30%) fills in affected programs: 17428 -> 17884 (2.62%) helped: 2 HURT: 41 LOST: 20 GAINED: 13 This also prevents regressions in "intel/fs: Promote integer constants after lowering integer multiplication" (note: that patch will probably not be committed). When the passes are reorderd, code like mul(8) acc0<1>D g9<8,8,1>D -2078209981D { align1 1Q }; gets turned into mov(1) g23<1>D 2078209981D { align1 WE_all 1N }; ... mul(8) acc0<1>D g13<8,8,1>D -g23<0,1,0>D { align1 1Q compacted }; It's not 100% clear why, but these produce different results. Note that -2078209981 & 0x0ffff = 0x0843, and -(2078209981 & 0x0ffff) = 0xffff0843. It seems like the upper 16-bits of the negation should be ignored. Fixes: `e64be391dd` ("intel/compiler: generalize the combine constants pass") Cc: Iago Toral Quiroga <itoral@igalia.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> The shaders with spills or fills hurt are the usual suspects. A couple compute shaders in Dirt Showdown and a compute shader in Bioshock Infinite. On Haswell, a compute shader (that appears twice in shader-db) from Aztec Ruins was also hurt for spill and fills. Haswell total instructions in shared programs: 11573934 -> 11568335 (-0.05%) instructions in affected programs: 828623 -> 823024 (-0.68%) helped: 2825 HURT: 6 helped stats (abs) min: 1 max: 134 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.02% max: 9.05% x̄: 0.84% x̃: 0.61% HURT stats (abs) min: 1 max: 216 x̄: 81.83 x̃: 56 HURT stats (rel) min: 0.16% max: 8.65% x̄: 4.21% x̃: 4.68% 95% mean confidence interval for instructions value: -2.31 -1.64 95% mean confidence interval for instructions %-change: -0.85% -0.80% Instructions are helped. total cycles in shared programs: 187573593 -> 187004633 (-0.30%) cycles in affected programs: 82816107 -> 82247147 (-0.69%) helped: 2186 HURT: 1741 helped stats (abs) min: 1 max: 35230 x̄: 326.96 x̃: 16 helped stats (rel) min: <.01% max: 46.11% x̄: 3.11% x̃: 0.90% HURT stats (abs) min: 1 max: 6138 x̄: 83.73 x̃: 16 HURT stats (rel) min: <.01% max: 104.11% x̄: 2.73% x̃: 0.75% 95% mean confidence interval for cycles value: -197.13 -92.64 95% mean confidence interval for cycles %-change: -0.72% -0.33% Cycles are helped. total spills in shared programs: 7870 -> 7743 (-1.61%) spills in affected programs: 2260 -> 2133 (-5.62%) helped: 31 HURT: 5 total fills in shared programs: 6320 -> 6263 (-0.90%) fills in affected programs: 3547 -> 3490 (-1.61%) helped: 31 HURT: 6 LOST: 9 GAINED: 9 Ivybridge total instructions in shared programs: 11863372 -> 11859793 (-0.03%) instructions in affected programs: 757183 -> 753604 (-0.47%) helped: 2236 HURT: 3 helped stats (abs) min: 1 max: 81 x̄: 1.86 x̃: 1 helped stats (rel) min: 0.03% max: 5.26% x̄: 0.74% x̃: 0.48% HURT stats (abs) min: 11 max: 301 x̄: 192.33 x̃: 265 HURT stats (rel) min: 1.55% max: 10.51% x̄: 6.89% x̃: 8.62% 95% mean confidence interval for instructions value: -2.01 -1.18 95% mean confidence interval for instructions %-change: -0.77% -0.70% Instructions are helped. total cycles in shared programs: 178377378 -> 177946087 (-0.24%) cycles in affected programs: 76261390 -> 75830099 (-0.57%) helped: 1635 HURT: 1395 helped stats (abs) min: 1 max: 34796 x̄: 333.53 x̃: 16 helped stats (rel) min: <.01% max: 47.15% x̄: 2.82% x̃: 0.64% HURT stats (abs) min: 1 max: 4315 x̄: 81.74 x̃: 18 HURT stats (rel) min: <.01% max: 49.98% x̄: 1.99% x̃: 0.53% 95% mean confidence interval for cycles value: -197.06 -87.62 95% mean confidence interval for cycles %-change: -0.78% -0.43% Cycles are helped. total spills in shared programs: 4188 -> 4182 (-0.14%) spills in affected programs: 1557 -> 1551 (-0.39%) helped: 30 HURT: 3 total fills in shared programs: 5056 -> 5245 (3.74%) fills in affected programs: 2708 -> 2897 (6.98%) helped: 30 HURT: 3 LOST: 5 GAINED: 1 No shader-db changes on any other Intel platform. Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3544> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3544>	2020-02-05 15:13:17 +00:00
Sagar Ghuge	18b28b5654	intel/compiler: Don't move immediate in register On Gen12, we support mixed mode HF/F operands, and also 3 source instruction supports immediate value support, so keep immediate as it is, if it fits properly in 16 bit field. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-10-21 20:32:43 -07:00
Maya Rashish	e16fadd545	intel/compiler: avoid truncating int64_t to int Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Maya Rashish <maya@netbsd.org>	2019-09-26 17:46:26 +00:00
Matt Turner	dabb5d4bee	i965/fs: Add a shader_stats struct. It'll grow further, and we'd like to avoid adding an additional parameter to fs_generator() for each new piece of data. v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-30 14:35:43 -07:00
Andrii Simiklit	fa2fc68de1	intel/compiler: don't use a keyword struct for a class fs_reg warning: struct 'fs_reg' was previously declared as a class Fixes: `e64be391` ("intel/compiler: generalize the combine constants pass") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-07-24 13:26:42 +00:00
Ian Romanick	21223acf7d	intel/fs: Fix D to W conversion in opt_combine_constants Found by GCC warning: src/intel/compiler/brw_fs_combine_constants.cpp: In function ‘bool needs_negate(const fs_reg, const imm)’: src/intel/compiler/brw_fs_combine_constants.cpp:306:34: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] return ((reg->d & 0xffffu) < 0) != (imm->w < 0); ~~~~~~~~~~~~~~~~~~~^~~ The result of the bit-and is a 32-bit value with the top bits all zero. This will never be < 0. Instead of masking off the bits, just cast to int16_t and let the compiler handle the actual conversion. Fixes: `e64be391dd` ("intel/compiler: generalize the combine constants pass") Cc: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-23 19:48:33 -07:00
Iago Toral Quiroga	e64be391dd	intel/compiler: generalize the combine constants pass At the very least we need it to handle HF too, since we are doing constant propagation for MAD and LRP, which relies on this pass to promote the immediates to GRF in the end, but ideally we want it to support even more types so we can take advantage of it to improve register pressure in some scenarios. v2 (Jason): - Support 64-bit types too. - Check if we need to set the half-float flag if the immediate already existed. - Multiply the size of the immediate by the width of the copy Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Matt Turner	2b801b6668	intel/compiler: Rearrange code to avoid future problems A follow on commit will move nr to the same union as the immediate data, so we should assert these invariants before we overwrite the nr field. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Jason Ekstrand	700bebb958	i965: Move the back-end compiler to src/intel/compiler Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:34 +00:00

24 commits