fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 09:00:10 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	c31b4420e7	st/nir: Re-vectorize shader IO We scalarize IO to enable further optimizations, such as propagating constant components across shaders, eliminating dead components, and so on. This patch attempts to re-vectorize those operations after the varying optimizations are done. Intel GPUs are a scalar architecture, but IO operations work on whole vec4's at a time, so we'd prefer to have a single IO load per vector rather than 4 scalar IO loads. This re-vectorization can help a lot. Broadcom GPUs, however, really do want scalar IO. radeonsi may want this, or may want to leave it to LLVM. So, we make a new flag in the NIR compiler options struct, and key it off of that, allowing drivers to pick. (It's a bit awkward because we have per-stage settings, but this is about IO between two stages...but I expect drivers to globally prefer one way or the other. We can adjust later if needed.) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-28 01:06:48 -07:00
Jason Ekstrand	f2dc0f2872	nir: Drop imov/fmov in favor of one mov instruction The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Rob Clark <robdclark@chromium.org>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	22421ca7be	nir/builder: Merge nir_[if]mov_alu into one nir_mov_alu helper Unless source modifiers are present, fmov and imov are the same. There's no good reason for having two helpers. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	cd73b6174b	nir/lower_to_source_mods: Stop turning add, sat, and neg into mov Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	2a39788d03	nir/source_mods: Add a helpers for setting source modifiers It's potentially a tiny bit less efficient but the helpers make it much easier to sort out the rules for updating source modifiers. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	ddd08e1888	nir/builder: Remove the use_fmov parameter from nir_swizzle This flag has caused more confusion than good in most cases. You can validly use imov for floats or fmov for integers because, without source modifiers, neither modify their input in any way. Using imov for floats is more reliable so we go that direction. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Dave Airlie	4785e50e75	nir/test: add split vars tests (v2) This just adds some split var splitting tests, it verifies by counting derefs and local vars. a basic load from inputs, store to array, same as before but with a shader temp struct { float } [4] don't split test a basic load from inputs, with some out of band loads. a load/store of only half the array two level array, load from inputs store to all levels a basic load from inputs with an indirect store to array. two level array, indirect store to lvl 0 two level array, indirect store to lvl 1 load from inputs, store to array twice load from input, store to array, load from array, store to another array. load and from input and copy deref to array create wildcard derefs, and do a copy v2: use array_imm helpers, move derefs out of loops, rename toplevel/secondlevel, use ints, fix lvl1 don't split test, rename globabls to shader_temp, add comment, check the derefs type Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-21 13:43:28 +10:00
Caio Marcelo de Oliveira Filho	005cc9ae37	nir: Fix clone of nir_variable state slots When num_state_slots is 0, don't create the array. This was triggering the following assert when running vkcube with NIR_TEST_CLONE=1 vkcube: ../src/compiler/nir/nir_split_per_member_structs.c:66: split_variable: Assertion `var->state_slots == NULL' failed. Fixes: `9fbd390dd4` "nir: Add support for cloning shaders" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-20 16:47:28 -07:00
Caio Marcelo de Oliveira Filho	f051fa6ad7	nir: Add nir_address_format_null_value() Returns the nir_const_value * with the representation of the NULL pointer for each address format. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	6bc9cdb1b7	nir: Add nir_address_format_32bit_offset This is a simple 32-bit address which is not a global address. Gives us a format that don't use 0 as its null pointer value. We will need this in anv to represent nir_var_mem_shared addresses. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	bdaf41107a	nir: Add nir_address_format_logical An address format representing a purely logical addressing model. In this model, all deref chains must be complete from the dereference operation to the variable. Cast derefs are not allowed. These addresses will be 32-bit scalars but the format is immaterial because you can always chase the chain. E.g. push constants in anv. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Dave Airlie	6b2b150a66	nir/validate: fix crash if entry is null. we validate assert entry just before this, but since that doesn't stop execution, we need to check entry before the next validation assert. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-20 16:26:48 +10:00
Caio Marcelo de Oliveira Filho	ded2c202d5	nir: Only convert SSA values to regs when needed If the SSA def produced by this instruction is only in the block in which it is defined and is not used by ifs or phis, then we don't have a reason to convert it to a register in nir_lower_ssa_defs_to_regs_block(). The special case for derefs is covered by the general case, so can be removed: at this point all derefs in the block are materialized (i.e. the whole deref chain is in the block) and derefs are not used in phis. v2: Fix wrong check for if_uses. If there's such an use, the def is not "local_to_block". (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 12:23:47 -07:00
Caio Marcelo de Oliveira Filho	8a995f2b5e	nir: Fix nir_opt_idiv_const when negatives are involved First, allow the case for negative powers of two. Then ensure that we use the absolute value of the non-constant value to calculate the quotient -- this was hinted in the code by the name 'uq'. This fixes an issue when 'd' is positive and 'n' is negative. The ishr will propagate the negative sign and we'll use nir_ineg() again, incorrectly. v2: First version used only ishr, but that isn't sufficient, since it never can produce a zero as a result. (Jason) Allow negative powers of two. (Caio) Fixes: `74492ebad9` "nir: Add a pass for lowering integer division by constants" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 10:55:03 -07:00
Lionel Landwerlin	e04cf0b612	nir: lower_non_uniform_access: iterate over instructions safely This pass moves instructions around and adds control-flow in the middle of blocks. We need to use nir_foreach_instr_safe to ensure that we iterate over instructions correctly anyway. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bd5457641` ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 10:22:01 +01:00
Alyssa Rosenzweig	46494c3dc1	nir/algebraic: Remove problematic "optimization" This line is no longer relevant now that booleans are 1-bit, and in fact causes issues (infinite progress loop between algebraic optimizations and copy prop) with constant vector masks. No shader-db changes on Intel platforms (Jason). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2019-05-16 02:08:37 +00:00
Anuj Phogat	a42163cbbc	compiler: Add lowering support for 64-bit saturate operations to software Fixes 7 Khronos GL CTS tests: KHR-GL45.gpu_shader_fp64.builtin.smoothstep_dvec{double, 2, 3, 4} KHR-GL45.gpu_shader_fp64.builtin.smoothstep_against_scalar_dvec{2, 3, 4} Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-15 23:30:30 +00:00
Lionel Landwerlin	391a836e8f	nir: fix lower_non_uniform_access pass Obviously missing the instruction insertion into the SSA list. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bd5457641` ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 18:15:20 +00:00
Ian Romanick	32d259713b	nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions The goal is to avoid having an extra MOV instruction to perform the saturate. Doing the subtraction first allows the saturate to be applied to the ADD instruction making the MOV unnecessary. Values generated in different block and values from non-ALU instructions (e.g., texture instructions) almost always need the extra MOV. Multiply instructions are restricted because doing this rearrangement can interfere with the generation of flrp and ffma instructions. v2: Now that the final method has been selected, squash three commits into one. All Intel platforms has similar results. (Ice Lake shown) total instructions in shared programs: 17223214 -> 17219386 (-0.02%) instructions in affected programs: 1524376 -> 1520548 (-0.25%) helped: 2686 HURT: 26 helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37% HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2 HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35% 95% mean confidence interval for instructions value: -1.46 -1.36 95% mean confidence interval for instructions %-change: -0.56% -0.50% Instructions are helped. total cycles in shared programs: 360811571 -> 360791896 (<.01%) cycles in affected programs: 103650214 -> 103630539 (-0.02%) helped: 1557 HURT: 675 helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16 helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64% HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14 HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49% 95% mean confidence interval for cycles value: -14.82 -2.81 95% mean confidence interval for cycles %-change: -0.50% -0.20% Cycles are helped. LOST: 2 GAINED: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	a7f0c57673	nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1) v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224337 -> 17224269 (<.01%) instructions in affected programs: 13578 -> 13510 (-0.50%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.05% -0.63% Instructions are helped. total cycles in shared programs: 360826090 -> 360825137 (<.01%) cycles in affected programs: 94867 -> 93914 (-1.00%) helped: 58 HURT: 1 helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18 helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22% HURT stats (abs) min: 76 max: 76 x̄: 76.00 x̃: 76 HURT stats (rel) min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86% 95% mean confidence interval for cycles value: -19.53 -12.78 95% mean confidence interval for cycles %-change: -1.56% -1.08% Cycles are helped. No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	281f20e26d	nir/algebraic: Strip double negatives from comparison sources All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224623 -> 17224337 (<.01%) instructions in affected programs: 32648 -> 32362 (-0.88%) helped: 148 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2 helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08% 95% mean confidence interval for instructions value: -1.97 -1.89 95% mean confidence interval for instructions %-change: -1.15% -1.00% Instructions are helped. total cycles in shared programs: 360828714 -> 360826090 (<.01%) cycles in affected programs: 347416 -> 344792 (-0.76%) helped: 148 HURT: 26 helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18 helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41% HURT stats (abs) min: 2 max: 337 x̄: 48.96 x̃: 6 HURT stats (rel) min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27% 95% mean confidence interval for cycles value: -23.78 -6.38 95% mean confidence interval for cycles %-change: -1.59% -0.79% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	45c7ff95fc	intel/compiler: Repeat nir_opt_algebraic_late A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 -> 17224623 (<.01%) instructions in affected programs: 4586 -> 4575 (-0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.36% -0.19% Instructions are helped. total cycles in shared programs: 360828542 -> 360828714 (<.01%) cycles in affected programs: 151159 -> 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: -13.48 17.95 95% mean confidence interval for cycles %-change: -0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 -> 13529542 (<.01%) instructions in affected programs: 358 -> 356 (-0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 -> 357289678 (<.01%) cycles in affected programs: 178324 -> 177691 (-0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: -18.28 3.89 95% mean confidence interval for cycles %-change: -1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 -> 8158980 (<.01%) instructions in affected programs: 22719 -> 22589 (-0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: -2.06 -1.94 95% mean confidence interval for instructions %-change: -0.78% -0.68% Instructions are helped. total cycles in shared programs: 188609448 -> 188609214 (<.01%) cycles in affected programs: 1875852 -> 1875618 (-0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: -1.95 -0.25 95% mean confidence interval for cycles %-change: -0.04% -0.01% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	d2a9ba03e3	Revert "nir: add late opt to turn inot/b2f combos back to bcsel" This reverts commit `7acc865226`. With these optimizations in place, the extra constant folding added in the next commit extends some live ranges of 0.0 and ±1.0 constants, and that causes several hundred shaders to have more spills and fills. I believe this optimization we made basically irrelevant by `7725d60938` "intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))". All Gen7.5+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225303 -> 17224634 (<.01%) instructions in affected programs: 879402 -> 878733 (-0.08%) helped: 679 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.02 -0.95 95% mean confidence interval for instructions %-change: -0.26% -0.22% Instructions are helped. total cycles in shared programs: 360842595 -> 360828542 (<.01%) cycles in affected programs: 110443594 -> 110429541 (-0.01%) helped: 389 HURT: 265 helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28 helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11% HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48 HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10% 95% mean confidence interval for cycles value: -75.65 32.67 95% mean confidence interval for cycles %-change: -0.49% -0.06% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12159 -> 12161 (0.02%) spills in affected programs: 13 -> 15 (15.38%) helped: 0 HURT: 1 total fills in shared programs: 25207 -> 25208 (<.01%) fills in affected programs: 25 -> 26 (4.00%) helped: 0 HURT: 1 Ivy Bridge total instructions in shared programs: 12082019 -> 12082013 (<.01%) instructions in affected programs: 1033 -> 1027 (-0.58%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.78% -0.45% Instructions are helped. total cycles in shared programs: 179849270 -> 179849157 (<.01%) cycles in affected programs: 4735 -> 4622 (-2.39%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18 helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36% 95% mean confidence interval for cycles value: -82.73 26.23 95% mean confidence interval for cycles %-change: -7.98% 2.28% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10882750 -> 10882748 (<.01%) instructions in affected programs: 266 -> 264 (-0.75%) helped: 2 HURT: 0 Iron Lake total cycles in shared programs: 188609440 -> 188609448 (<.01%) cycles in affected programs: 4320 -> 4328 (0.19%) helped: 0 HURT: 2 GM45 total cycles in shared programs: 129016868 -> 129016872 (<.01%) cycles in affected programs: 2302 -> 2306 (0.17%) helped: 0 HURT: 1 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3cb091f8b4	nir/algebraic: Eliminate a tautological compare The value-range tracking pass that is coming is not clever enough to know that the result of the ffma must be non-negative. Making it that smart will require quite a bit of work. It might be possible to add a special case that detects that a whole tree of fadd(fmul(fsat(a), fneg(fsat(a))), 1.0) cannot be negative. For cases when the comparison is used in the domain guard for a square-root (see nir/algebraic: Simplify fsqrt domain guard), the compare may be converted to a fmax. This patch also handles that case. All of the affected cases are in DiRT: Showdown. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225365 -> 17225303 (<.01%) instructions in affected programs: 40051 -> 39989 (-0.15%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.31% -0.22% Instructions are helped. total cycles in shared programs: 360842788 -> 360842595 (<.01%) cycles in affected programs: 1818081 -> 1817888 (-0.01%) helped: 29 HURT: 22 helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14 helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42% HURT stats (abs) min: 1 max: 108 x̄: 18.45 x̃: 7 HURT stats (rel) min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19% 95% mean confidence interval for cycles value: -14.48 6.91 95% mean confidence interval for cycles %-change: -0.71% 0.21% Inconclusive result (value mean confidence interval includes 0). No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	9725e45b3d	nir/algebraic: Simplify fsqrt domain guard All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228376 -> 17225365 (-0.02%) instructions in affected programs: 280732 -> 277721 (-1.07%) helped: 1072 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2 helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07% 95% mean confidence interval for instructions value: -2.92 -2.70 95% mean confidence interval for instructions %-change: -1.48% -1.37% Instructions are helped. total cycles in shared programs: 360935690 -> 360842788 (-0.03%) cycles in affected programs: 7838017 -> 7745115 (-1.19%) helped: 1569 HURT: 69 helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20 helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12% HURT stats (abs) min: 1 max: 2820 x̄: 98.22 x̃: 47 HURT stats (rel) min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31% 95% mean confidence interval for cycles value: -63.55 -49.89 95% mean confidence interval for cycles %-change: -3.33% -2.96% Cycles are helped. No changes on any other platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	e2ad047779	nir/search: Don't compare 8-bit or 1-bit constants with floats Without this, adding an algebraic rule like (('bcsel', ('flt', a, 0.0), 0.0, ...), ...), will cause assertion failures inside nir_src_comp_as_float in GTF-GL46.gtf21.GL.lessThan.lessThan_vec3_frag (and related tests) from the OpenGL CTS and shaders/closed/steam/witcher-2/511.shader_test from shader-db. All of these cases have some code that ends up like ('bcsel', ('flt', a, 0.0), 'b@1', ...) When the 'b@1' is tested, nir_src_comp_as_float fails because there's no such thing as a 1-bit float. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	5116646a76	nir/algebraic: Recognize open-coded fsat with modifiers This change also enables a later change (nir/algebraic: Replace 1-fsat(a) with fsat(1-a)) to affect more shaders. Almost all of the affected shaders are in Bioshock Infinite, and all of those shaders all require GLSL 4.10. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228584 -> 17228376 (<.01%) instructions in affected programs: 31438 -> 31230 (-0.66%) helped: 105 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1 helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70% 95% mean confidence interval for instructions value: -2.20 -1.76 95% mean confidence interval for instructions %-change: -0.80% -0.67% Instructions are helped. total cycles in shared programs: 360936431 -> 360935690 (<.01%) cycles in affected programs: 420100 -> 419359 (-0.18%) helped: 71 HURT: 21 helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10 helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48% HURT stats (abs) min: 1 max: 198 x̄: 29.90 x̃: 10 HURT stats (rel) min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90% 95% mean confidence interval for cycles value: -16.77 0.66 95% mean confidence interval for cycles %-change: -0.85% -0.06% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	c769641c8e	nir/algebraic: Push unary operations into source operands of fsat source Pushing a unary operation, like fneg, into the operation that generates its operand allows the fsat to be applied to the inner instruction instead of on a separate instruction that performs the unary operation. This changes fmul ssa_100, ssa_99, ssa_98 fmov.sat ssa_101, -ssa_100 into fmul.sat ssa_100, -ssa_99, ssa_98 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 17228658 -> 17228584 (<.01%) instructions in affected programs: 3163 -> 3089 (-2.34%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2 helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51% 95% mean confidence interval for instructions value: -1.66 -1.37 95% mean confidence interval for instructions %-change: -4.37% -3.00% Instructions are helped. total cycles in shared programs: 360937144 -> 360936431 (<.01%) cycles in affected programs: 24029 -> 23316 (-2.97%) helped: 47 HURT: 2 helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16 helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for cycles value: -16.05 -13.05 95% mean confidence interval for cycles %-change: -4.07% -3.15% Cycles are helped. All Gen7 and earlier platforms had similar results. (Haswell shown) total instructions in shared programs: 13536059 -> 13535884 (<.01%) instructions in affected programs: 8797 -> 8622 (-1.99%) helped: 150 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96% 95% mean confidence interval for instructions value: -1.23 -1.11 95% mean confidence interval for instructions %-change: -3.97% -3.05% Instructions are helped. total cycles in shared programs: 357696119 -> 357694193 (<.01%) cycles in affected programs: 50216 -> 48290 (-3.84%) helped: 109 HURT: 14 helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16 helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37% HURT stats (abs) min: 2 max: 26 x̄: 10.14 x̃: 5 HURT stats (rel) min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92% 95% mean confidence interval for cycles value: -19.27 -12.05 95% mean confidence interval for cycles %-change: -7.34% -5.31% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3b74790941	nir/algebraic: Recognize open-coded flrp(a, b, fsat(c)) All Gen6+ GPUs had similar results. (Skylake shown) total instructions in shared programs: 15336712 -> 15336622 (<.01%) instructions in affected programs: 3952 -> 3862 (-2.28%) helped: 24 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4 helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46% 95% mean confidence interval for instructions value: -4.06 -3.44 95% mean confidence interval for instructions %-change: -2.47% -2.22% Instructions are helped. total cycles in shared programs: 355722052 -> 355721235 (<.01%) cycles in affected programs: 27326 -> 26509 (-2.99%) helped: 20 HURT: 4 helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14 helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23% HURT stats (abs) min: 2 max: 64 x̄: 19.50 x̃: 6 HURT stats (rel) min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55% 95% mean confidence interval for cycles value: -61.61 -6.47 95% mean confidence interval for cycles %-change: -5.59% -0.39% Cycles are helped. No changes on Ice Lake, Iron Lake, or GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:21 -07:00
Ian Romanick	a7724b1cbb	nir/algebraic: Add missing ffma(-1, a, b) pattern All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17229439 -> 17229377 (<.01%) instructions in affected programs: 9859 -> 9797 (-0.63%) helped: 41 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67% 95% mean confidence interval for instructions value: -1.88 -1.14 95% mean confidence interval for instructions %-change: -2.48% -0.81% Instructions are helped. total cycles in shared programs: 360944145 -> 360942989 (<.01%) cycles in affected programs: 178167 -> 177011 (-0.65%) helped: 36 HURT: 19 helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5 helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45% HURT stats (abs) min: 1 max: 34 x̄: 11.21 x̃: 6 HURT stats (rel) min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50% 95% mean confidence interval for cycles value: -36.01 -6.02 95% mean confidence interval for cycles %-change: -4.18% -0.57% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:03 -07:00
Ian Romanick	7b4ff6a1af	nir: Mark ffma as 2src_commutative This doesn't make any real difference now, but future work (not in this series) will add a LOT of ffma patterns. Having to duplicate all of them for ffma(a, b, c) and ffma(b, a, c) is just terrible. No shader-db changes on any Intel platform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	e049a9c92b	nir: Add support for 2src_commutative ops that have 3 sources v2: Instead of handling 3 sources as a special case, generalize with loops to N sources. Suggested by Jason. v3: Further generalize by only checking that number of sources is >= 2. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	ede45bf9cf	nir: Rename commutative to 2src_commutative The meaning of the new name is that the first two sources are commutative. Since this is only currently applied to two-source operations, there is no change. A future change will mark ffma as 2src_commutative. It is also possible that future work will add 3src_commutative for opcodes like fmin3. v2: s/commutative_2src/2src_commutative/g. I had originally considered this, but I discarded it because I did't want to deal with identifiers that (should) start with 2. Jason suggested it in review, so we decided that _2src_commutative would be used in nir_opcodes.py. Also add some comments documenting what 2src_commutative means. Also suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Jason Ekstrand	712f99934c	nir/validate: Use a single set for SSA def validation The current SSA def validation we do in nir_validate validates three things: 1. That each SSA def is only ever used in the function in which it is defined. 2. That an nir_src exists in an SSA def's use list if and only if it points to that SSA def. 3. That each nir_src is in the correct use list (uses or if_uses) based on whether it's an if condition or not. The way we were doing this before was that we had a hash table which provided a map from SSA def to a small ssa_def_validate_state data structure which contained a pointer to the nir_function_impl and two hash sets, one for each use list. This meant piles of allocation and creating of little hash sets. It also meant one hash lookup for each SSA def plus one per use as well as two per src (because we have to look up the ssa_def_validate_state and then look up the use.) It also involved a second walk over the instructions as a post-validate step. This commit changes us to use a single low-collision hash set of SSA sources for all of this by being a bit more clever. We accomplish the objectives above as follows: 1. The list is clear when we start validating a function. If the nir_src references an SSA def which is defined in a different function, it simply won't be in the set. 2. When validating the SSA defs, we walk the uses and verify that they have is_ssa set and that the SSA def points to the SSA def we're validating. This catches the case of a nir_src being in the wrong list. We then put the nir_src in the set and, when we validate the nir_src, we assert that it's in the set. This takes care of any cases where a nir_src isn't in the use list. After checking that the nir_src is in the set, we remove it from the set and, at the end of nir_function_impl validation, we assert that the set is empty. This takes care of any cases where a nir_src is in a use list but the instruction is no longer in the shader. 3. When we put a nir_src in the set, we set the bottom bit of the pointer to 1 if it's the condition of an if. This lets us detect whether or not a nir_src is in the right list. When running shader-db with an optimized debug build of mesa on my laptop, I get the following shader-db CPU times: With NIR_VALIDATE=0 3033.34 seconds Before this commit 20224.83 seconds After this commit 6255.50 seconds Assuming shader-db is a representative sampling of GLSL shaders, this means that making this change yields an 81% reduction in the time spent in nir_validate. It still isn't cheap but enabling validation now only increases compile times by 2x instead of 6.6x. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Jason Ekstrand	460567eabf	nir/validate: Use a ralloc context for our temporary data All of our hash tables and sets are already using ralloc. There's really no good reason why we don't just make a ralloc context rather than try to remember to clean everything up manually. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Ruslan Kabatsayev	974c4d679c	nir: Fix wrong sign in lower_rcp The nested fma calls were supposed to implement x_new = x + x * (1 - xsrc), but instead current code is equivalent to x_new = x - x (1 - x*src). The result is that Newton-Raphson steps don't improve precision at all. This patch fixes this problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-11 09:25:22 -07:00
Alyssa Rosenzweig	006cafc243	nir: Add blend_const_color_rgba sysval This represents a float vec4 constant color, as passed to glBlendColor. While the existing 4 shader sysvals are retained to minimize code churn, a single vectorized intrinsic is required for efficient blending on vector architectures. (This may also apply to archictectures like Bifrost where ALU is scalar but load/store is vector; it largely depends on how blending is implemented per-driver.) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:49:28 +00:00
Jonathan Marek	d0bff89159	nir: allow specifying a set of opcodes in lower_alu_to_scalar This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:10:41 +00:00
Ian Romanick	ed5f024515	nir/flrp: Reassociate add in flrp(±1, b, c) lowering path With this reassociation, this lowering path is still beneficial. Ice Lake total instructions in shared programs: 17220191 -> 17207181 (-0.08%) instructions in affected programs: 999871 -> 986861 (-1.30%) helped: 3703 HURT: 17 helped stats (abs) min: 1 max: 686 x̄: 3.52 x̃: 3 helped stats (rel) min: 0.09% max: 51.97% x̄: 2.21% x̃: 1.35% HURT stats (abs) min: 1 max: 9 x̄: 1.47 x̃: 1 HURT stats (rel) min: 0.08% max: 4.55% x̄: 0.78% x̃: 0.55% 95% mean confidence interval for instructions value: -4.01 -2.99 95% mean confidence interval for instructions %-change: -2.29% -2.11% Instructions are helped. total cycles in shared programs: 360871298 -> 360755040 (-0.03%) cycles in affected programs: 9931334 -> 9815076 (-1.17%) helped: 2388 HURT: 1569 helped stats (abs) min: 1 max: 10228 x̄: 93.54 x̃: 18 helped stats (rel) min: <.01% max: 74.11% x̄: 3.36% x̃: 1.07% HURT stats (abs) min: 1 max: 1917 x̄: 68.27 x̃: 22 HURT stats (rel) min: <.01% max: 44.90% x̄: 3.44% x̃: 1.72% 95% mean confidence interval for cycles value: -39.48 -19.28 95% mean confidence interval for cycles %-change: -0.86% -0.46% Cycles are helped. total spills in shared programs: 12355 -> 12159 (-1.59%) spills in affected programs: 295 -> 99 (-66.44%) helped: 2 HURT: 1 total fills in shared programs: 25398 -> 25207 (-0.75%) fills in affected programs: 288 -> 97 (-66.32%) helped: 2 HURT: 1 LOST: 3 GAINED: 44 Iron Lake total instructions in shared programs: 8169225 -> 8159729 (-0.12%) instructions in affected programs: 1025712 -> 1016216 (-0.93%) helped: 3352 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.83 x̃: 3 helped stats (rel) min: 0.15% max: 12.00% x̄: 1.51% x̃: 1.05% 95% mean confidence interval for instructions value: -2.86 -2.80 95% mean confidence interval for instructions %-change: -1.56% -1.46% Instructions are helped. total cycles in shared programs: 188656796 -> 188612280 (-0.02%) cycles in affected programs: 18633584 -> 18589068 (-0.24%) helped: 3085 HURT: 14 helped stats (abs) min: 2 max: 72 x̄: 14.45 x̃: 12 helped stats (rel) min: 0.02% max: 5.73% x̄: 0.73% x̃: 0.31% HURT stats (abs) min: 2 max: 4 x̄: 3.71 x̃: 4 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -14.55 -14.18 95% mean confidence interval for cycles %-change: -0.76% -0.69% Cycles are helped. GM45 total instructions in shared programs: 5026905 -> 5021856 (-0.10%) instructions in affected programs: 584169 -> 579120 (-0.86%) helped: 1776 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.84 x̃: 3 helped stats (rel) min: 0.15% max: 11.11% x̄: 1.43% x̃: 0.98% 95% mean confidence interval for instructions value: -2.88 -2.80 95% mean confidence interval for instructions %-change: -1.50% -1.37% Instructions are helped. total cycles in shared programs: 129047376 -> 129018918 (-0.02%) cycles in affected programs: 12941924 -> 12913466 (-0.22%) helped: 1722 HURT: 14 helped stats (abs) min: 4 max: 72 x̄: 16.56 x̃: 18 helped stats (rel) min: 0.02% max: 5.73% x̄: 0.72% x̃: 0.30% HURT stats (abs) min: 2 max: 4 x̄: 3.71 x̃: 4 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -16.65 -16.13 95% mean confidence interval for cycles %-change: -0.76% -0.66% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-08 07:41:54 -07:00
Ian Romanick	ba203a3cd7	nir/flrp: Fix typo on the flrp(±1, b, c) path After Samuel reported the bisect, I was able to find the bug by inspection. Good thing for well-named varibles. :) Unfortunately, this undoes almost all of the benefit of the original patch. Ice Lake total instructions in shared programs: 17183159 -> 17218166 (0.20%) instructions in affected programs: 1308722 -> 1343729 (2.67%) helped: 98 HURT: 4746 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.47% max: 2.70% x̄: 0.60% x̃: 0.57% HURT stats (abs) min: 1 max: 691 x̄: 7.40 x̃: 8 HURT stats (rel) min: 0.10% max: 700.00% x̄: 5.82% x̃: 2.83% 95% mean confidence interval for instructions value: 6.82 7.64 95% mean confidence interval for instructions %-change: 5.22% 6.15% Instructions are HURT. total cycles in shared programs: 360705959 -> 360853522 (0.04%) cycles in affected programs: 10754380 -> 10901943 (1.37%) helped: 1594 HURT: 3331 helped stats (abs) min: 1 max: 1896 x̄: 119.81 x̃: 60 helped stats (rel) min: <.01% max: 35.48% x̄: 5.06% x̃: 3.64% HURT stats (abs) min: 1 max: 10208 x̄: 101.63 x̃: 38 HURT stats (rel) min: 0.01% max: 878.95% x̄: 9.01% x̃: 2.78% 95% mean confidence interval for cycles value: 21.11 38.81 95% mean confidence interval for cycles %-change: 3.76% 5.15% Cycles are HURT. total spills in shared programs: 12158 -> 12355 (1.62%) spills in affected programs: 98 -> 295 (201.02%) helped: 1 HURT: 2 total fills in shared programs: 25204 -> 25398 (0.77%) fills in affected programs: 94 -> 288 (206.38%) helped: 0 HURT: 3 LOST: 15 GAINED: 8 Iron Lake total instructions in shared programs: 8121430 -> 8166733 (0.56%) instructions in affected programs: 1148353 -> 1193656 (3.95%) helped: 2 HURT: 4046 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.85% max: 1.92% x̄: 1.89% x̃: 1.89% HURT stats (abs) min: 1 max: 43 x̄: 11.20 x̃: 11 HURT stats (rel) min: 0.20% max: 716.67% x̄: 7.40% x̃: 3.87% 95% mean confidence interval for instructions value: 11.02 11.37 95% mean confidence interval for instructions %-change: 6.84% 7.94% Instructions are HURT. total cycles in shared programs: 188376326 -> 188601568 (0.12%) cycles in affected programs: 27416674 -> 27641916 (0.82%) helped: 68 HURT: 3947 helped stats (abs) min: 2 max: 222 x̄: 13.88 x̃: 6 helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01% HURT stats (abs) min: 2 max: 670 x̄: 57.31 x̃: 64 HURT stats (rel) min: <.01% max: 1811.11% x̄: 4.11% x̃: 1.09% 95% mean confidence interval for cycles value: 55.01 57.20 95% mean confidence interval for cycles %-change: 2.88% 5.19% Cycles are HURT. LOST: 35 GAINED: 3 GM45 total instructions in shared programs: 4979794 -> 5003551 (0.48%) instructions in affected programs: 635174 -> 658931 (3.74%) helped: 1 HURT: 2142 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.85% max: 1.85% x̄: 1.85% x̃: 1.85% HURT stats (abs) min: 1 max: 43 x̄: 11.09 x̃: 11 HURT stats (rel) min: 0.20% max: 716.67% x̄: 7.00% x̃: 3.53% 95% mean confidence interval for instructions value: 10.85 11.33 95% mean confidence interval for instructions %-change: 6.25% 7.74% Instructions are HURT. total cycles in shared programs: 128519586 -> 128654990 (0.11%) cycles in affected programs: 17635304 -> 17770708 (0.77%) helped: 46 HURT: 2088 helped stats (abs) min: 4 max: 220 x̄: 18.13 x̃: 6 helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01% HURT stats (abs) min: 2 max: 670 x̄: 65.25 x̃: 66 HURT stats (rel) min: <.01% max: 1464.29% x̄: 4.05% x̃: 0.99% 95% mean confidence interval for cycles value: 61.75 65.15 95% mean confidence interval for cycles %-change: 2.58% 5.34% Cycles are HURT. LOST: 38 GAINED: 38 Fixes: `5b908db604` ("nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently") Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-08 07:41:26 -07:00
Vasily Khoruzhick	e67e4e90b2	nir: implement lowering for fsin and fcos Lower sin and cos using Nick's fast sin/cos approximation from https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 15:25:21 +00:00
Ian Romanick	ab86926156	nir/algebraic: Reassociate open-coded flrp(1, b, c) In a previous verion of this patch, Jason commented, "Re-associating based on whether or not something has a constant value of 1.0 seems a bit sneaky. I think it's well within the rules but it seems like something that could bite you." That is possibly true. The reassociation will generate different results if fabs(b) >= 2**24 and fabs(c) < 0.5. The delta increases as fabs(c) approaches 0. However, i965 has done this same reassociation indirectly for years. We would previously allow nir_op_flrp on all pre-Gen11 hardware even though Gen4 and Gen5 do not have a LRP instruction. Optimizations in nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a, b, c). On Gen7+, the hardware performs the same arithmetic as a(1-c)+bc. Gen6 seems to implement LRP as a+c(b-a). On Gen4 and Gen5, we would lower LRP to a sequence of instructions that implement a(1-c)+bc. The lowering happens after all constant folding, so we would litterally generate a 1+(-1) instruction sequence in this scenario: one instruction to load either 1 or -1 in a register, and another instruction to add either -1 or 1 to it. This patch just cuts out the middle man. Do the reassociation that we've always done, but do it explicitly at a time when we can benefit from other optimizations. A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently" are restored by this patch. This includes a few shaders in ET:QW. I tried a similar thing for open-coded flrp(-1, b, c), and it hurt instructions on 35 shaders for ILK without helping any. The helped / hurt cycles was about even. No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8172020 -> 8164367 (-0.09%) instructions in affected programs: 1089851 -> 1082198 (-0.70%) helped: 3285 HURT: 64 helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2 helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38% 95% mean confidence interval for instructions value: -2.32 -2.25 95% mean confidence interval for instructions %-change: -1.16% -1.09% Instructions are helped. total cycles in shared programs: 188758338 -> 188719974 (-0.02%) cycles in affected programs: 20004922 -> 19966558 (-0.19%) helped: 3012 HURT: 477 helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12 helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24% HURT stats (abs) min: 2 max: 328 x̄: 4.27 x̃: 4 HURT stats (rel) min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11% 95% mean confidence interval for cycles value: -11.38 -10.62 95% mean confidence interval for cycles %-change: -0.46% -0.41% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	c995d1ca3a	nir/flrp: Lower flrp(a, b, #c) differently This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	ae02622d8f	nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists There is little effect on Intel GPUs now because we almost always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any other Intel platforms. GM45 and Iron Lake had similar results. (Iron Lake shown) total cycles in shared programs: 188852500 -> 188852484 (<.01%) cycles in affected programs: 14612 -> 14596 (-0.11%) helped: 4 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11% 95% mean confidence interval for cycles value: -4.00 -4.00 95% mean confidence interval for cycles %-change: -0.13% -0.09% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	6698d861a5	nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any Intel platform. Before a number of large rebases this helped cycles in a couple shaders on Iron Lake and GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	5b908db604	nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8189888 -> 8153912 (-0.44%) instructions in affected programs: 1199037 -> 1163061 (-3.00%) helped: 4124 HURT: 10 helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9 helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02% HURT stats (abs) min: 1 max: 2 x̄: 1.20 x̃: 1 HURT stats (rel) min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06% 95% mean confidence interval for instructions value: -8.84 -8.56 95% mean confidence interval for instructions %-change: -5.12% -4.77% Instructions are helped. total cycles in shared programs: 188606710 -> 188426964 (-0.10%) cycles in affected programs: 27505596 -> 27325850 (-0.65%) helped: 4026 HURT: 77 helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46 helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85% HURT stats (abs) min: 2 max: 376 x̄: 17.79 x̃: 6 HURT stats (rel) min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04% 95% mean confidence interval for cycles value: -44.75 -42.87 95% mean confidence interval for cycles %-change: -2.44% -2.17% Cycles are helped. LOST: 3 GAINED: 35 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	23c5501b77	nir/flrp: Lower flrp(#a, #b, c) differently If the magnitudes of #a and #b are such that (b-a) won't lose too much precision, lower as a+c(b-a). No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8192503 -> 8192383 (<.01%) instructions in affected programs: 18417 -> 18297 (-0.65%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43% 95% mean confidence interval for instructions value: -2.48 -1.05 95% mean confidence interval for instructions %-change: -1.56% -0.63% Instructions are helped. total cycles in shared programs: 188662536 -> 188661956 (<.01%) cycles in affected programs: 744476 -> 743896 (-0.08%) helped: 62 HURT: 0 helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6 helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06% 95% mean confidence interval for cycles value: -12.37 -6.34 95% mean confidence interval for cycles %-change: -0.48% -0.06% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	158370ed2a	nir/flrp: Add new lowering pass for flrp instructions This pass will soon grow to include some optimizations that are difficult or impossible to implement correctly within nir_opt_algebraic. It also include the ability to generate strictly correct code which the current nir_opt_algebraic lowering lacks (though that could be changed). v2: Document the parameters to nir_lower_flrp. Rebase on top of `3766334923` ("compiler/nir: add lowering for 16-bit flrp") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Ian Romanick	dc566a033c	nir/algebraic: Pull common multiplication out of flrp arguments All Intel platforms had similar results. (Skylake shown) total instructions in shared programs: 15342485 -> 15337495 (-0.03%) instructions in affected programs: 217456 -> 212466 (-2.29%) helped: 1539 HURT: 1 helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3 helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56% 95% mean confidence interval for instructions value: -3.39 -3.09 95% mean confidence interval for instructions %-change: -3.24% -2.96% Instructions are helped. total cycles in shared programs: 355734320 -> 355728237 (<.01%) cycles in affected programs: 1851555 -> 1845472 (-0.33%) helped: 835 HURT: 575 helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14 helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81% HURT stats (abs) min: 1 max: 322 x̄: 48.40 x̃: 14 HURT stats (rel) min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43% 95% mean confidence interval for cycles value: -8.50 -0.13 95% mean confidence interval for cycles %-change: 0.48% 1.62% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00

... 13 14 15 16 17 ...

2290 commits