fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-25 06:08:21 +02:00

Author	SHA1	Message	Date
Ian Romanick	281f20e26d	nir/algebraic: Strip double negatives from comparison sources All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224623 -> 17224337 (<.01%) instructions in affected programs: 32648 -> 32362 (-0.88%) helped: 148 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2 helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08% 95% mean confidence interval for instructions value: -1.97 -1.89 95% mean confidence interval for instructions %-change: -1.15% -1.00% Instructions are helped. total cycles in shared programs: 360828714 -> 360826090 (<.01%) cycles in affected programs: 347416 -> 344792 (-0.76%) helped: 148 HURT: 26 helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18 helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41% HURT stats (abs) min: 2 max: 337 x̄: 48.96 x̃: 6 HURT stats (rel) min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27% 95% mean confidence interval for cycles value: -23.78 -6.38 95% mean confidence interval for cycles %-change: -1.59% -0.79% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	45c7ff95fc	intel/compiler: Repeat nir_opt_algebraic_late A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 -> 17224623 (<.01%) instructions in affected programs: 4586 -> 4575 (-0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.36% -0.19% Instructions are helped. total cycles in shared programs: 360828542 -> 360828714 (<.01%) cycles in affected programs: 151159 -> 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: -13.48 17.95 95% mean confidence interval for cycles %-change: -0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 -> 13529542 (<.01%) instructions in affected programs: 358 -> 356 (-0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 -> 357289678 (<.01%) cycles in affected programs: 178324 -> 177691 (-0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: -18.28 3.89 95% mean confidence interval for cycles %-change: -1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 -> 8158980 (<.01%) instructions in affected programs: 22719 -> 22589 (-0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: -2.06 -1.94 95% mean confidence interval for instructions %-change: -0.78% -0.68% Instructions are helped. total cycles in shared programs: 188609448 -> 188609214 (<.01%) cycles in affected programs: 1875852 -> 1875618 (-0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: -1.95 -0.25 95% mean confidence interval for cycles %-change: -0.04% -0.01% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	d2a9ba03e3	Revert "nir: add late opt to turn inot/b2f combos back to bcsel" This reverts commit `7acc865226`. With these optimizations in place, the extra constant folding added in the next commit extends some live ranges of 0.0 and ±1.0 constants, and that causes several hundred shaders to have more spills and fills. I believe this optimization we made basically irrelevant by `7725d60938` "intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))". All Gen7.5+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225303 -> 17224634 (<.01%) instructions in affected programs: 879402 -> 878733 (-0.08%) helped: 679 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.02 -0.95 95% mean confidence interval for instructions %-change: -0.26% -0.22% Instructions are helped. total cycles in shared programs: 360842595 -> 360828542 (<.01%) cycles in affected programs: 110443594 -> 110429541 (-0.01%) helped: 389 HURT: 265 helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28 helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11% HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48 HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10% 95% mean confidence interval for cycles value: -75.65 32.67 95% mean confidence interval for cycles %-change: -0.49% -0.06% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12159 -> 12161 (0.02%) spills in affected programs: 13 -> 15 (15.38%) helped: 0 HURT: 1 total fills in shared programs: 25207 -> 25208 (<.01%) fills in affected programs: 25 -> 26 (4.00%) helped: 0 HURT: 1 Ivy Bridge total instructions in shared programs: 12082019 -> 12082013 (<.01%) instructions in affected programs: 1033 -> 1027 (-0.58%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.78% -0.45% Instructions are helped. total cycles in shared programs: 179849270 -> 179849157 (<.01%) cycles in affected programs: 4735 -> 4622 (-2.39%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18 helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36% 95% mean confidence interval for cycles value: -82.73 26.23 95% mean confidence interval for cycles %-change: -7.98% 2.28% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10882750 -> 10882748 (<.01%) instructions in affected programs: 266 -> 264 (-0.75%) helped: 2 HURT: 0 Iron Lake total cycles in shared programs: 188609440 -> 188609448 (<.01%) cycles in affected programs: 4320 -> 4328 (0.19%) helped: 0 HURT: 2 GM45 total cycles in shared programs: 129016868 -> 129016872 (<.01%) cycles in affected programs: 2302 -> 2306 (0.17%) helped: 0 HURT: 1 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3cb091f8b4	nir/algebraic: Eliminate a tautological compare The value-range tracking pass that is coming is not clever enough to know that the result of the ffma must be non-negative. Making it that smart will require quite a bit of work. It might be possible to add a special case that detects that a whole tree of fadd(fmul(fsat(a), fneg(fsat(a))), 1.0) cannot be negative. For cases when the comparison is used in the domain guard for a square-root (see nir/algebraic: Simplify fsqrt domain guard), the compare may be converted to a fmax. This patch also handles that case. All of the affected cases are in DiRT: Showdown. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225365 -> 17225303 (<.01%) instructions in affected programs: 40051 -> 39989 (-0.15%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.31% -0.22% Instructions are helped. total cycles in shared programs: 360842788 -> 360842595 (<.01%) cycles in affected programs: 1818081 -> 1817888 (-0.01%) helped: 29 HURT: 22 helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14 helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42% HURT stats (abs) min: 1 max: 108 x̄: 18.45 x̃: 7 HURT stats (rel) min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19% 95% mean confidence interval for cycles value: -14.48 6.91 95% mean confidence interval for cycles %-change: -0.71% 0.21% Inconclusive result (value mean confidence interval includes 0). No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	9725e45b3d	nir/algebraic: Simplify fsqrt domain guard All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228376 -> 17225365 (-0.02%) instructions in affected programs: 280732 -> 277721 (-1.07%) helped: 1072 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2 helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07% 95% mean confidence interval for instructions value: -2.92 -2.70 95% mean confidence interval for instructions %-change: -1.48% -1.37% Instructions are helped. total cycles in shared programs: 360935690 -> 360842788 (-0.03%) cycles in affected programs: 7838017 -> 7745115 (-1.19%) helped: 1569 HURT: 69 helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20 helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12% HURT stats (abs) min: 1 max: 2820 x̄: 98.22 x̃: 47 HURT stats (rel) min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31% 95% mean confidence interval for cycles value: -63.55 -49.89 95% mean confidence interval for cycles %-change: -3.33% -2.96% Cycles are helped. No changes on any other platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	e2ad047779	nir/search: Don't compare 8-bit or 1-bit constants with floats Without this, adding an algebraic rule like (('bcsel', ('flt', a, 0.0), 0.0, ...), ...), will cause assertion failures inside nir_src_comp_as_float in GTF-GL46.gtf21.GL.lessThan.lessThan_vec3_frag (and related tests) from the OpenGL CTS and shaders/closed/steam/witcher-2/511.shader_test from shader-db. All of these cases have some code that ends up like ('bcsel', ('flt', a, 0.0), 'b@1', ...) When the 'b@1' is tested, nir_src_comp_as_float fails because there's no such thing as a 1-bit float. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	5116646a76	nir/algebraic: Recognize open-coded fsat with modifiers This change also enables a later change (nir/algebraic: Replace 1-fsat(a) with fsat(1-a)) to affect more shaders. Almost all of the affected shaders are in Bioshock Infinite, and all of those shaders all require GLSL 4.10. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228584 -> 17228376 (<.01%) instructions in affected programs: 31438 -> 31230 (-0.66%) helped: 105 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1 helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70% 95% mean confidence interval for instructions value: -2.20 -1.76 95% mean confidence interval for instructions %-change: -0.80% -0.67% Instructions are helped. total cycles in shared programs: 360936431 -> 360935690 (<.01%) cycles in affected programs: 420100 -> 419359 (-0.18%) helped: 71 HURT: 21 helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10 helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48% HURT stats (abs) min: 1 max: 198 x̄: 29.90 x̃: 10 HURT stats (rel) min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90% 95% mean confidence interval for cycles value: -16.77 0.66 95% mean confidence interval for cycles %-change: -0.85% -0.06% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	c769641c8e	nir/algebraic: Push unary operations into source operands of fsat source Pushing a unary operation, like fneg, into the operation that generates its operand allows the fsat to be applied to the inner instruction instead of on a separate instruction that performs the unary operation. This changes fmul ssa_100, ssa_99, ssa_98 fmov.sat ssa_101, -ssa_100 into fmul.sat ssa_100, -ssa_99, ssa_98 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 17228658 -> 17228584 (<.01%) instructions in affected programs: 3163 -> 3089 (-2.34%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2 helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51% 95% mean confidence interval for instructions value: -1.66 -1.37 95% mean confidence interval for instructions %-change: -4.37% -3.00% Instructions are helped. total cycles in shared programs: 360937144 -> 360936431 (<.01%) cycles in affected programs: 24029 -> 23316 (-2.97%) helped: 47 HURT: 2 helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16 helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for cycles value: -16.05 -13.05 95% mean confidence interval for cycles %-change: -4.07% -3.15% Cycles are helped. All Gen7 and earlier platforms had similar results. (Haswell shown) total instructions in shared programs: 13536059 -> 13535884 (<.01%) instructions in affected programs: 8797 -> 8622 (-1.99%) helped: 150 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96% 95% mean confidence interval for instructions value: -1.23 -1.11 95% mean confidence interval for instructions %-change: -3.97% -3.05% Instructions are helped. total cycles in shared programs: 357696119 -> 357694193 (<.01%) cycles in affected programs: 50216 -> 48290 (-3.84%) helped: 109 HURT: 14 helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16 helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37% HURT stats (abs) min: 2 max: 26 x̄: 10.14 x̃: 5 HURT stats (rel) min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92% 95% mean confidence interval for cycles value: -19.27 -12.05 95% mean confidence interval for cycles %-change: -7.34% -5.31% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3b74790941	nir/algebraic: Recognize open-coded flrp(a, b, fsat(c)) All Gen6+ GPUs had similar results. (Skylake shown) total instructions in shared programs: 15336712 -> 15336622 (<.01%) instructions in affected programs: 3952 -> 3862 (-2.28%) helped: 24 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4 helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46% 95% mean confidence interval for instructions value: -4.06 -3.44 95% mean confidence interval for instructions %-change: -2.47% -2.22% Instructions are helped. total cycles in shared programs: 355722052 -> 355721235 (<.01%) cycles in affected programs: 27326 -> 26509 (-2.99%) helped: 20 HURT: 4 helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14 helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23% HURT stats (abs) min: 2 max: 64 x̄: 19.50 x̃: 6 HURT stats (rel) min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55% 95% mean confidence interval for cycles value: -61.61 -6.47 95% mean confidence interval for cycles %-change: -5.59% -0.39% Cycles are helped. No changes on Ice Lake, Iron Lake, or GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:21 -07:00
Ian Romanick	a7724b1cbb	nir/algebraic: Add missing ffma(-1, a, b) pattern All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17229439 -> 17229377 (<.01%) instructions in affected programs: 9859 -> 9797 (-0.63%) helped: 41 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67% 95% mean confidence interval for instructions value: -1.88 -1.14 95% mean confidence interval for instructions %-change: -2.48% -0.81% Instructions are helped. total cycles in shared programs: 360944145 -> 360942989 (<.01%) cycles in affected programs: 178167 -> 177011 (-0.65%) helped: 36 HURT: 19 helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5 helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45% HURT stats (abs) min: 1 max: 34 x̄: 11.21 x̃: 6 HURT stats (rel) min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50% 95% mean confidence interval for cycles value: -36.01 -6.02 95% mean confidence interval for cycles %-change: -4.18% -0.57% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:03 -07:00
Ian Romanick	7b4ff6a1af	nir: Mark ffma as 2src_commutative This doesn't make any real difference now, but future work (not in this series) will add a LOT of ffma patterns. Having to duplicate all of them for ffma(a, b, c) and ffma(b, a, c) is just terrible. No shader-db changes on any Intel platform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	e049a9c92b	nir: Add support for 2src_commutative ops that have 3 sources v2: Instead of handling 3 sources as a special case, generalize with loops to N sources. Suggested by Jason. v3: Further generalize by only checking that number of sources is >= 2. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	ede45bf9cf	nir: Rename commutative to 2src_commutative The meaning of the new name is that the first two sources are commutative. Since this is only currently applied to two-source operations, there is no change. A future change will mark ffma as 2src_commutative. It is also possible that future work will add 3src_commutative for opcodes like fmin3. v2: s/commutative_2src/2src_commutative/g. I had originally considered this, but I discarded it because I did't want to deal with identifiers that (should) start with 2. Jason suggested it in review, so we decided that _2src_commutative would be used in nir_opcodes.py. Also add some comments documenting what 2src_commutative means. Also suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Jason Ekstrand	712f99934c	nir/validate: Use a single set for SSA def validation The current SSA def validation we do in nir_validate validates three things: 1. That each SSA def is only ever used in the function in which it is defined. 2. That an nir_src exists in an SSA def's use list if and only if it points to that SSA def. 3. That each nir_src is in the correct use list (uses or if_uses) based on whether it's an if condition or not. The way we were doing this before was that we had a hash table which provided a map from SSA def to a small ssa_def_validate_state data structure which contained a pointer to the nir_function_impl and two hash sets, one for each use list. This meant piles of allocation and creating of little hash sets. It also meant one hash lookup for each SSA def plus one per use as well as two per src (because we have to look up the ssa_def_validate_state and then look up the use.) It also involved a second walk over the instructions as a post-validate step. This commit changes us to use a single low-collision hash set of SSA sources for all of this by being a bit more clever. We accomplish the objectives above as follows: 1. The list is clear when we start validating a function. If the nir_src references an SSA def which is defined in a different function, it simply won't be in the set. 2. When validating the SSA defs, we walk the uses and verify that they have is_ssa set and that the SSA def points to the SSA def we're validating. This catches the case of a nir_src being in the wrong list. We then put the nir_src in the set and, when we validate the nir_src, we assert that it's in the set. This takes care of any cases where a nir_src isn't in the use list. After checking that the nir_src is in the set, we remove it from the set and, at the end of nir_function_impl validation, we assert that the set is empty. This takes care of any cases where a nir_src is in a use list but the instruction is no longer in the shader. 3. When we put a nir_src in the set, we set the bottom bit of the pointer to 1 if it's the condition of an if. This lets us detect whether or not a nir_src is in the right list. When running shader-db with an optimized debug build of mesa on my laptop, I get the following shader-db CPU times: With NIR_VALIDATE=0 3033.34 seconds Before this commit 20224.83 seconds After this commit 6255.50 seconds Assuming shader-db is a representative sampling of GLSL shaders, this means that making this change yields an 81% reduction in the time spent in nir_validate. It still isn't cheap but enabling validation now only increases compile times by 2x instead of 6.6x. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Jason Ekstrand	460567eabf	nir/validate: Use a ralloc context for our temporary data All of our hash tables and sets are already using ralloc. There's really no good reason why we don't just make a ralloc context rather than try to remember to clean everything up manually. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Ruslan Kabatsayev	974c4d679c	nir: Fix wrong sign in lower_rcp The nested fma calls were supposed to implement x_new = x + x * (1 - xsrc), but instead current code is equivalent to x_new = x - x (1 - x*src). The result is that Newton-Raphson steps don't improve precision at all. This patch fixes this problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-11 09:25:22 -07:00
Alyssa Rosenzweig	006cafc243	nir: Add blend_const_color_rgba sysval This represents a float vec4 constant color, as passed to glBlendColor. While the existing 4 shader sysvals are retained to minimize code churn, a single vectorized intrinsic is required for efficient blending on vector architectures. (This may also apply to archictectures like Bifrost where ALU is scalar but load/store is vector; it largely depends on how blending is implemented per-driver.) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:49:28 +00:00
Jonathan Marek	d0bff89159	nir: allow specifying a set of opcodes in lower_alu_to_scalar This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:10:41 +00:00
Ian Romanick	ed5f024515	nir/flrp: Reassociate add in flrp(±1, b, c) lowering path With this reassociation, this lowering path is still beneficial. Ice Lake total instructions in shared programs: 17220191 -> 17207181 (-0.08%) instructions in affected programs: 999871 -> 986861 (-1.30%) helped: 3703 HURT: 17 helped stats (abs) min: 1 max: 686 x̄: 3.52 x̃: 3 helped stats (rel) min: 0.09% max: 51.97% x̄: 2.21% x̃: 1.35% HURT stats (abs) min: 1 max: 9 x̄: 1.47 x̃: 1 HURT stats (rel) min: 0.08% max: 4.55% x̄: 0.78% x̃: 0.55% 95% mean confidence interval for instructions value: -4.01 -2.99 95% mean confidence interval for instructions %-change: -2.29% -2.11% Instructions are helped. total cycles in shared programs: 360871298 -> 360755040 (-0.03%) cycles in affected programs: 9931334 -> 9815076 (-1.17%) helped: 2388 HURT: 1569 helped stats (abs) min: 1 max: 10228 x̄: 93.54 x̃: 18 helped stats (rel) min: <.01% max: 74.11% x̄: 3.36% x̃: 1.07% HURT stats (abs) min: 1 max: 1917 x̄: 68.27 x̃: 22 HURT stats (rel) min: <.01% max: 44.90% x̄: 3.44% x̃: 1.72% 95% mean confidence interval for cycles value: -39.48 -19.28 95% mean confidence interval for cycles %-change: -0.86% -0.46% Cycles are helped. total spills in shared programs: 12355 -> 12159 (-1.59%) spills in affected programs: 295 -> 99 (-66.44%) helped: 2 HURT: 1 total fills in shared programs: 25398 -> 25207 (-0.75%) fills in affected programs: 288 -> 97 (-66.32%) helped: 2 HURT: 1 LOST: 3 GAINED: 44 Iron Lake total instructions in shared programs: 8169225 -> 8159729 (-0.12%) instructions in affected programs: 1025712 -> 1016216 (-0.93%) helped: 3352 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.83 x̃: 3 helped stats (rel) min: 0.15% max: 12.00% x̄: 1.51% x̃: 1.05% 95% mean confidence interval for instructions value: -2.86 -2.80 95% mean confidence interval for instructions %-change: -1.56% -1.46% Instructions are helped. total cycles in shared programs: 188656796 -> 188612280 (-0.02%) cycles in affected programs: 18633584 -> 18589068 (-0.24%) helped: 3085 HURT: 14 helped stats (abs) min: 2 max: 72 x̄: 14.45 x̃: 12 helped stats (rel) min: 0.02% max: 5.73% x̄: 0.73% x̃: 0.31% HURT stats (abs) min: 2 max: 4 x̄: 3.71 x̃: 4 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -14.55 -14.18 95% mean confidence interval for cycles %-change: -0.76% -0.69% Cycles are helped. GM45 total instructions in shared programs: 5026905 -> 5021856 (-0.10%) instructions in affected programs: 584169 -> 579120 (-0.86%) helped: 1776 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.84 x̃: 3 helped stats (rel) min: 0.15% max: 11.11% x̄: 1.43% x̃: 0.98% 95% mean confidence interval for instructions value: -2.88 -2.80 95% mean confidence interval for instructions %-change: -1.50% -1.37% Instructions are helped. total cycles in shared programs: 129047376 -> 129018918 (-0.02%) cycles in affected programs: 12941924 -> 12913466 (-0.22%) helped: 1722 HURT: 14 helped stats (abs) min: 4 max: 72 x̄: 16.56 x̃: 18 helped stats (rel) min: 0.02% max: 5.73% x̄: 0.72% x̃: 0.30% HURT stats (abs) min: 2 max: 4 x̄: 3.71 x̃: 4 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -16.65 -16.13 95% mean confidence interval for cycles %-change: -0.76% -0.66% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-08 07:41:54 -07:00
Ian Romanick	ba203a3cd7	nir/flrp: Fix typo on the flrp(±1, b, c) path After Samuel reported the bisect, I was able to find the bug by inspection. Good thing for well-named varibles. :) Unfortunately, this undoes almost all of the benefit of the original patch. Ice Lake total instructions in shared programs: 17183159 -> 17218166 (0.20%) instructions in affected programs: 1308722 -> 1343729 (2.67%) helped: 98 HURT: 4746 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.47% max: 2.70% x̄: 0.60% x̃: 0.57% HURT stats (abs) min: 1 max: 691 x̄: 7.40 x̃: 8 HURT stats (rel) min: 0.10% max: 700.00% x̄: 5.82% x̃: 2.83% 95% mean confidence interval for instructions value: 6.82 7.64 95% mean confidence interval for instructions %-change: 5.22% 6.15% Instructions are HURT. total cycles in shared programs: 360705959 -> 360853522 (0.04%) cycles in affected programs: 10754380 -> 10901943 (1.37%) helped: 1594 HURT: 3331 helped stats (abs) min: 1 max: 1896 x̄: 119.81 x̃: 60 helped stats (rel) min: <.01% max: 35.48% x̄: 5.06% x̃: 3.64% HURT stats (abs) min: 1 max: 10208 x̄: 101.63 x̃: 38 HURT stats (rel) min: 0.01% max: 878.95% x̄: 9.01% x̃: 2.78% 95% mean confidence interval for cycles value: 21.11 38.81 95% mean confidence interval for cycles %-change: 3.76% 5.15% Cycles are HURT. total spills in shared programs: 12158 -> 12355 (1.62%) spills in affected programs: 98 -> 295 (201.02%) helped: 1 HURT: 2 total fills in shared programs: 25204 -> 25398 (0.77%) fills in affected programs: 94 -> 288 (206.38%) helped: 0 HURT: 3 LOST: 15 GAINED: 8 Iron Lake total instructions in shared programs: 8121430 -> 8166733 (0.56%) instructions in affected programs: 1148353 -> 1193656 (3.95%) helped: 2 HURT: 4046 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.85% max: 1.92% x̄: 1.89% x̃: 1.89% HURT stats (abs) min: 1 max: 43 x̄: 11.20 x̃: 11 HURT stats (rel) min: 0.20% max: 716.67% x̄: 7.40% x̃: 3.87% 95% mean confidence interval for instructions value: 11.02 11.37 95% mean confidence interval for instructions %-change: 6.84% 7.94% Instructions are HURT. total cycles in shared programs: 188376326 -> 188601568 (0.12%) cycles in affected programs: 27416674 -> 27641916 (0.82%) helped: 68 HURT: 3947 helped stats (abs) min: 2 max: 222 x̄: 13.88 x̃: 6 helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01% HURT stats (abs) min: 2 max: 670 x̄: 57.31 x̃: 64 HURT stats (rel) min: <.01% max: 1811.11% x̄: 4.11% x̃: 1.09% 95% mean confidence interval for cycles value: 55.01 57.20 95% mean confidence interval for cycles %-change: 2.88% 5.19% Cycles are HURT. LOST: 35 GAINED: 3 GM45 total instructions in shared programs: 4979794 -> 5003551 (0.48%) instructions in affected programs: 635174 -> 658931 (3.74%) helped: 1 HURT: 2142 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.85% max: 1.85% x̄: 1.85% x̃: 1.85% HURT stats (abs) min: 1 max: 43 x̄: 11.09 x̃: 11 HURT stats (rel) min: 0.20% max: 716.67% x̄: 7.00% x̃: 3.53% 95% mean confidence interval for instructions value: 10.85 11.33 95% mean confidence interval for instructions %-change: 6.25% 7.74% Instructions are HURT. total cycles in shared programs: 128519586 -> 128654990 (0.11%) cycles in affected programs: 17635304 -> 17770708 (0.77%) helped: 46 HURT: 2088 helped stats (abs) min: 4 max: 220 x̄: 18.13 x̃: 6 helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01% HURT stats (abs) min: 2 max: 670 x̄: 65.25 x̃: 66 HURT stats (rel) min: <.01% max: 1464.29% x̄: 4.05% x̃: 0.99% 95% mean confidence interval for cycles value: 61.75 65.15 95% mean confidence interval for cycles %-change: 2.58% 5.34% Cycles are HURT. LOST: 38 GAINED: 38 Fixes: `5b908db604` ("nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently") Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-08 07:41:26 -07:00
Vasily Khoruzhick	e67e4e90b2	nir: implement lowering for fsin and fcos Lower sin and cos using Nick's fast sin/cos approximation from https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 15:25:21 +00:00
Ian Romanick	ab86926156	nir/algebraic: Reassociate open-coded flrp(1, b, c) In a previous verion of this patch, Jason commented, "Re-associating based on whether or not something has a constant value of 1.0 seems a bit sneaky. I think it's well within the rules but it seems like something that could bite you." That is possibly true. The reassociation will generate different results if fabs(b) >= 2**24 and fabs(c) < 0.5. The delta increases as fabs(c) approaches 0. However, i965 has done this same reassociation indirectly for years. We would previously allow nir_op_flrp on all pre-Gen11 hardware even though Gen4 and Gen5 do not have a LRP instruction. Optimizations in nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a, b, c). On Gen7+, the hardware performs the same arithmetic as a(1-c)+bc. Gen6 seems to implement LRP as a+c(b-a). On Gen4 and Gen5, we would lower LRP to a sequence of instructions that implement a(1-c)+bc. The lowering happens after all constant folding, so we would litterally generate a 1+(-1) instruction sequence in this scenario: one instruction to load either 1 or -1 in a register, and another instruction to add either -1 or 1 to it. This patch just cuts out the middle man. Do the reassociation that we've always done, but do it explicitly at a time when we can benefit from other optimizations. A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently" are restored by this patch. This includes a few shaders in ET:QW. I tried a similar thing for open-coded flrp(-1, b, c), and it hurt instructions on 35 shaders for ILK without helping any. The helped / hurt cycles was about even. No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8172020 -> 8164367 (-0.09%) instructions in affected programs: 1089851 -> 1082198 (-0.70%) helped: 3285 HURT: 64 helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2 helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38% 95% mean confidence interval for instructions value: -2.32 -2.25 95% mean confidence interval for instructions %-change: -1.16% -1.09% Instructions are helped. total cycles in shared programs: 188758338 -> 188719974 (-0.02%) cycles in affected programs: 20004922 -> 19966558 (-0.19%) helped: 3012 HURT: 477 helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12 helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24% HURT stats (abs) min: 2 max: 328 x̄: 4.27 x̃: 4 HURT stats (rel) min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11% 95% mean confidence interval for cycles value: -11.38 -10.62 95% mean confidence interval for cycles %-change: -0.46% -0.41% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	c995d1ca3a	nir/flrp: Lower flrp(a, b, #c) differently This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	ae02622d8f	nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists There is little effect on Intel GPUs now because we almost always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any other Intel platforms. GM45 and Iron Lake had similar results. (Iron Lake shown) total cycles in shared programs: 188852500 -> 188852484 (<.01%) cycles in affected programs: 14612 -> 14596 (-0.11%) helped: 4 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11% 95% mean confidence interval for cycles value: -4.00 -4.00 95% mean confidence interval for cycles %-change: -0.13% -0.09% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	6698d861a5	nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any Intel platform. Before a number of large rebases this helped cycles in a couple shaders on Iron Lake and GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	5b908db604	nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8189888 -> 8153912 (-0.44%) instructions in affected programs: 1199037 -> 1163061 (-3.00%) helped: 4124 HURT: 10 helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9 helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02% HURT stats (abs) min: 1 max: 2 x̄: 1.20 x̃: 1 HURT stats (rel) min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06% 95% mean confidence interval for instructions value: -8.84 -8.56 95% mean confidence interval for instructions %-change: -5.12% -4.77% Instructions are helped. total cycles in shared programs: 188606710 -> 188426964 (-0.10%) cycles in affected programs: 27505596 -> 27325850 (-0.65%) helped: 4026 HURT: 77 helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46 helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85% HURT stats (abs) min: 2 max: 376 x̄: 17.79 x̃: 6 HURT stats (rel) min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04% 95% mean confidence interval for cycles value: -44.75 -42.87 95% mean confidence interval for cycles %-change: -2.44% -2.17% Cycles are helped. LOST: 3 GAINED: 35 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	23c5501b77	nir/flrp: Lower flrp(#a, #b, c) differently If the magnitudes of #a and #b are such that (b-a) won't lose too much precision, lower as a+c(b-a). No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8192503 -> 8192383 (<.01%) instructions in affected programs: 18417 -> 18297 (-0.65%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43% 95% mean confidence interval for instructions value: -2.48 -1.05 95% mean confidence interval for instructions %-change: -1.56% -0.63% Instructions are helped. total cycles in shared programs: 188662536 -> 188661956 (<.01%) cycles in affected programs: 744476 -> 743896 (-0.08%) helped: 62 HURT: 0 helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6 helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06% 95% mean confidence interval for cycles value: -12.37 -6.34 95% mean confidence interval for cycles %-change: -0.48% -0.06% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	158370ed2a	nir/flrp: Add new lowering pass for flrp instructions This pass will soon grow to include some optimizations that are difficult or impossible to implement correctly within nir_opt_algebraic. It also include the ability to generate strictly correct code which the current nir_opt_algebraic lowering lacks (though that could be changed). v2: Document the parameters to nir_lower_flrp. Rebase on top of `3766334923` ("compiler/nir: add lowering for 16-bit flrp") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Ian Romanick	dc566a033c	nir/algebraic: Pull common multiplication out of flrp arguments All Intel platforms had similar results. (Skylake shown) total instructions in shared programs: 15342485 -> 15337495 (-0.03%) instructions in affected programs: 217456 -> 212466 (-2.29%) helped: 1539 HURT: 1 helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3 helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56% 95% mean confidence interval for instructions value: -3.39 -3.09 95% mean confidence interval for instructions %-change: -3.24% -2.96% Instructions are helped. total cycles in shared programs: 355734320 -> 355728237 (<.01%) cycles in affected programs: 1851555 -> 1845472 (-0.33%) helped: 835 HURT: 575 helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14 helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81% HURT stats (abs) min: 1 max: 322 x̄: 48.40 x̃: 14 HURT stats (rel) min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43% 95% mean confidence interval for cycles value: -8.50 -0.13 95% mean confidence interval for cycles %-change: 0.48% 1.62% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Ian Romanick	a83a6e9690	nir/algebraic: Pull common addition out of flrp arguments v2: Augment the late optimization patterns with a couple pre-ffma pass patterns. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15342982 -> 15342485 (<.01%) instructions in affected programs: 56304 -> 55807 (-0.88%) helped: 235 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1 helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74% 95% mean confidence interval for instructions value: -2.31 -1.92 95% mean confidence interval for instructions %-change: -1.46% -1.09% Instructions are helped. total cycles in shared programs: 355734740 -> 355734320 (<.01%) cycles in affected programs: 1028807 -> 1028387 (-0.04%) helped: 134 HURT: 104 helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8 helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61% HURT stats (abs) min: 1 max: 203 x̄: 29.06 x̃: 8 HURT stats (rel) min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46% 95% mean confidence interval for cycles value: -8.51 4.98 95% mean confidence interval for cycles %-change: -0.35% 0.39% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10886815 -> 10886390 (<.01%) instructions in affected programs: 36883 -> 36458 (-1.15%) helped: 147 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23% 95% mean confidence interval for instructions value: -3.12 -2.67 95% mean confidence interval for instructions %-change: -1.83% -1.38% Instructions are helped. total cycles in shared programs: 154188360 -> 154186902 (<.01%) cycles in affected programs: 388094 -> 386636 (-0.38%) helped: 90 HURT: 58 helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15 helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83% HURT stats (abs) min: 1 max: 684 x̄: 31.97 x̃: 10 HURT stats (rel) min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51% 95% mean confidence interval for cycles value: -22.62 2.92 95% mean confidence interval for cycles %-change: -0.68% 0.05% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8221239 -> 8220357 (-0.01%) instructions in affected programs: 54560 -> 53678 (-1.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3 helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17% 95% mean confidence interval for instructions value: -5.21 -4.28 95% mean confidence interval for instructions %-change: -2.23% -1.72% Instructions are helped. total cycles in shared programs: 188654442 -> 188650364 (<.01%) cycles in affected programs: 1454384 -> 1450306 (-0.28%) helped: 204 HURT: 0 helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18 helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22% 95% mean confidence interval for cycles value: -22.38 -17.60 95% mean confidence interval for cycles %-change: -0.67% -0.46% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Christian Gmeiner	4e110eca42	nir: nir_shader_compiler_options: drop native_integers Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 07:35:52 +02:00
Vasily Khoruzhick	443c5a3cd6	nir: add int_to_float lowering pass This new pass lowers ints and bools to floats. It allows hardware that doesn't have native integers (e.g. Mali4x0) use the same code paths as modern hardware. It uses newly introduced pass to gather SSA types and should be used as late as possible. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 01:07:27 +00:00
Karol Herbst	d11b807da5	nir: Add nir_op_vec helper with that we can simplify code where nir vectors are created v2: merge both lines in nir_vec Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-04 12:27:51 +02:00
Karol Herbst	681fb7ea05	nir: Add a nir_builder_alu variant which takes an array of components v2: rename to nir_build_alu_src_arr Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-04 12:27:51 +02:00
Jason Ekstrand	91899495a1	nir: Add a SSA type gathering pass This new pass (which isn't even compile-tested) attempts to determine the ALU type of all the SSA values in a function impl. It takes a greedy approach and assigns intness or floatness to everything it thinks can possibly contain an int or a float. Some values will be labled as both int and float and some will be labled as neither and it is up to the caller to decide what to do with this information. However, for a "nice" shader where the original source contained no bit-casts and no implicit bit-casts were introduced by optimizations, there shouldn't be any overlap in the two sets save for the odd CSEd zero constant. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-04 03:52:05 +00:00
Connor Abbott	d0ea9877b8	nir/algebraic: Don't emit empty initializers for MSVC Just don't emit the transform array at all if there are no transforms v2: - Don't use len(array) > 0 (Dylan) - Keep using ARRAY_SIZE to make the generated C code easier to read (Jason).	2019-05-04 00:13:21 +02:00
Dave Airlie	6fd6246d92	nir: fix lower vars to ssa for larger vector sizes. This has a couple of hardcoded vec4 limits in it, change them to the proper sizing to avoid future issues. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:23:00 +10:00
Rob Clark	b73dd91f60	nir: fix nir tex print harder Fixes: `691d5a825a` nir: rework tex instruction printing Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 15:06:01 -07:00
Rob Clark	a99c360a46	nir: add pass to lower fb reads Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	a2c89a85f4	nir: fix lower_wpos_ytransform in load_frag_coord case Apparently we never hit this path. Or at least haven't for a rather long time. But in either case (load_deref or load_frag_coord), we can just directly use the intrinsic's ssa dest. So stop passing the nir_variable (which would be NULL in the load_frag_coord case) around and instead just use &intr->dest.ssa. (This ofc means we need to setup the cursor to insert after the instruction, which seems to be another bug of the original implementation.) Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	691d5a825a	nir: rework tex instruction printing The extra comma at the end was annoying me. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Connor Abbott	6ec4ed48fc	nir/search: Add debugging code to dump the pattern matched This was useful while debugging the previous commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Connor Abbott	7ce86e6938	nir/search: Add automaton-based pre-searching nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Ian Romanick	85e6865ff6	nir: Saturating integer arithmetic is not associative In 8-bits, iadd_sat(iadd_sat(0x7f, 0x7f), -1) = iadd_sat(0x7f, -1) = 0x7e but, iadd_sat(0x7f, iadd_sat(0x7f, -1)) = iadd_sat(0x7f, 0x7e) = 0x7f Fixes: `272e927d0e` ("nir/spirv: initial handling of OpenCL.std extension opcodes") Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-01 09:07:47 -07:00
Jonathan Marek	0c6702cfa5	nir: improve convert_yuv_to_rgb Use a different arrangement of constants to allow more ffma. A vec4 backend will now use 3 fma for yuv_to_rgb. On freedreno/ir3, it is down from 10 to 7 alu (4 fma, 3 mul, 3 add to 7 fma). Other backends shouldn't be hurt. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-01 04:13:36 -07:00
Eric Engestrom	7ca8ba199f	delete autotools .gitignore files One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-29 21:17:19 +00:00
Kenneth Graunke	2b44b27dbe	nir: Add a new nir_cf_list_is_empty_block() helper. Helper and name suggested by Eric Anholt. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-28 22:36:08 -07:00
Andreas Baierl	b82de2b4d7	nir: add rcp(w) lowering for gl_FragCoord On some hardware (e.g. Mali400) the shader needs to apply some transformations for correct gl_FragCoord handling. The lowering actions look like the following in pseudocode: gl_FragCoord.xyz = gl_FragCoord_orig.xyz gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w Add this lowering as a nir pass in preparation for using it in the driver. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-29 02:46:44 +00:00
Tapani Pälli	7a7f182dac	nir: use braces around subobject in initializer Used same syntax as elsewhere with Mesa sources, verified result against MSVC with godbolt.org. fixes following warning with clang: warning: suggest braces around initialization of subobject v2: empty braces -> braces around subobject (Caio, Kristian) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-26 12:01:22 -07:00

... 41 42 43 44 45 ...

3670 commits