fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-17 22:38:06 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	1f60f1aa3d	nir/gcm: Use an array for storing the early block We are about to adjust our instruction block assignment algorithm and we will want to know the current block that the instruction lives in. In order to allow for this, we can't overwrite nir_instr::block in the early scheduling pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>	2020-04-20 03:46:29 +00:00
Jason Ekstrand	6006a9e275	nir/gcm: Loop over blocks in pin_instructions Now that we have the new block iterators, we can simplify things a bit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>	2020-04-20 03:46:29 +00:00
Jason Ekstrand	4d083b52c0	nir/dominance: Better handle unreachable blocks v2: Fix minor comments (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>	2020-04-20 03:46:29 +00:00
Arcady Goldmints-Orlov	ec1b96fdc8	nir: Lower returns correctly inside nested loops Inside nested flow control, nir_lower_returns inserts predicated breaks in the outer block. However, it would omit doing this if the remainder of the outer block (after the inner block) was empty. This is not correct in the case of loops, as execution just wraps back around to the start of the loop, so this change doesn't skip the predication inside loops. Fixes: `79dec93ead` (nir: Add return lowering pass) Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2724 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4603>	2020-04-19 02:54:08 +00:00
Timothy Arceri	c19ebca308	nir: add matrix_layout to nir_variable data This will be used by the following patch. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4623>	2020-04-18 11:50:44 +00:00
Jason Ekstrand	f5deed138a	spirv,nir: Move the SPIR-V vector insert code to NIR This also makes spirv_to_nir a bit simpler because the new nir_vector_insert helper automatically handles a constant component selector like nir_vector_extract does. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4495>	2020-04-17 19:21:44 +00:00
Jason Ekstrand	acaccff4d3	nir/builder: Handle any bit-size selector in nir_extract Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4495>	2020-04-17 19:21:44 +00:00
Jason Ekstrand	9b17d7caac	nir: Add some sanity assertions in opt_large_constants We make some assumptions in opt_large_constants such as the size_align function returning the obvious sizes for vectors. Now that we've got the deref_size lying around, we may as well assert it's consistent with our assumptions. In particular, we now assert that it really claims booleans are 32-bit. If anyone's driver ever decides to be clever and change this, we'll now catch the breakage earlier. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>	2020-04-16 17:00:13 +00:00
Jason Ekstrand	33eb43349e	nir: Add an alignment to nir_intrinsic_load_constant In `f1883cc73d` we tried to pass through alignments from load_constant intrinsics when rewriting them to load_ubo in iris. However, those intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices. It's easy enough to add them. We just call the size/align function on the vector type at the end of our deref chain and use the alignment returned from there. It's possible we could do better by walking the whole deref chain but this should be good enough. Fixes: `f1883cc73d` "iris: Set alignments on cbuf0 and constant reads" Closes: #2739 Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>	2020-04-16 17:00:13 +00:00
Connor Abbott	abcfb64370	ir3: Fix LDC offset units I had missed that LDC actually uses vec4 units for its offset. This means that we have to create a new instruction, and lower it in ir3_nir_lower_io_offsets, similar to the existing SSBO instructions. Unfortunately we can't assume that loads are always vec4-aligned, so we have to use the alignment information that NIR gives us. Unfortunately, it's currently woefully inadequate, and will have to be fixed to give us good codegen in the future. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568>	2020-04-15 22:38:20 +00:00
Connor Abbott	274f3815a5	ir3: Plumb through bindless support Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>	2020-04-09 15:56:55 +00:00
Timothy Arceri	52c8bc0130	nir: make opt_if_loop_terminator() less strict nir_cf_{extract,reinsert}() can't stitch a block together if the block we are extracting ends in a jump but other jumps nested in further ifs should be fine to move. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4477>	2020-04-08 01:35:45 +00:00
Caio Marcelo de Oliveira Filho	5dc85abc4f	nir: Add per_view attribute to nir_variable If a nir_variable is tagged with per_view, it must be an array with size corresponding to the number of views. For slot-tracking, it is considered to take just the slot for a single element -- drivers will take care of expanding this appropriately. This will be used to implement the ability of having per-view position in a vertex shader in Intel platforms. Acked-by: Rafael Antognolli <rafael.antognolli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>	2020-04-07 17:16:09 +00:00
Rob Clark	57557783f6	nir/lower_amul: fix slot calculation Fixes incorrect indexing in dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.packed.mat2x3 Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455>	2020-04-06 18:00:17 +00:00
Rob Clark	4638a16a93	nir: add some swizzle helpers Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455>	2020-04-06 18:00:17 +00:00
Jason Ekstrand	e78a7a1825	nir: Assert memory loads are aligned We've had alignment parameters on these operations for a while but a bunch of places weren't setting them. That should be resolved now so we can start validating that they're always set. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4441> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4441>	2020-04-06 15:57:30 +00:00
Hyunjun Ko	9f174eb2df	nir: fix wrong assignment to buffer in xfb_varyings_info Tested with dEQP-VK.transform_feedback.fuzz.various_buffers.buffers100_instance_array_vertex Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4459> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4459>	2020-04-06 08:55:05 +00:00
Rob Clark	bf64648864	nir: fix definition of imadsh_mix16 for vectors Fixes: `c27b3758fa` ("nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes") Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423>	2020-04-04 00:07:10 +00:00
Daniel Schürmann	dc69738b0f	nir: fix unpack_64_4x16 in lower_alu_to_scalar() Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>	2020-04-03 23:13:15 +01:00
Jason Ekstrand	36a32af008	nir/load_store_vectorize: Add support for nir_var_mem_global Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>	2020-04-03 20:26:54 +00:00
Jason Ekstrand	b6273291b5	nir/load_store_vectorize: Use nir_iadd_imm for offsets This makes it capable of handling 64-bit offsets Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>	2020-04-03 20:26:54 +00:00
Jason Ekstrand	04d08ea149	nir/load_store_vectorize: Fix shared atomic info These were clearly copied and pasted from SSBOs. The shared atomics don't have an SSBO index so their offset is src0 and data is src1. Fixes: `ce9205c03b` "nir: add a load/store vectorization pass" Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>	2020-04-03 20:26:54 +00:00
Jason Ekstrand	c71c1f44b0	nir/from_ssa: Only chain movs when a src is also a dest The algorithm we use for resolving parallel copy instructions plays this little shell game with the values. The reason for this is that it lets us handle cases where, for instance we have a -> b and b -> a and we need to use a temporary to do a swap. One result of this algorithm is that it tends to emit a lot of mov chains which are typcially really bad for GPUs where a mov is far from free. For instance, it's likely to turn this: r16 = ssa_0; r17 = ssa_0; r18 = ssa_0; r15 = ssa_0 into this: r15 = mov ssa_0 r18 = mov r15 r17 = mov r18 r16 = mov r17 which, if it's the only thing in a block (this is common for phis) is impossible for a scheduler to fix because of the dependencies and you end up with significant stalling. If, on the other hand, we only do the chaining in the actual case where we need to free up a so that it can be used as a destination, we can emit this: r15 = mov ssa_0 r18 = mov ssa_0 r17 = mov ssa_0 r16 = mov ssa_0 which is far nicer to the scheduler. On Intel, our copy propagation pass will undo the chain for us so this has no shader-db impact. However, for less intelligent back-ends, it's probably a lot better. Reviewed-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412>	2020-04-02 19:06:46 +00:00
Mark Janes	90a8b458ac	nir: check shader type before writing to shaderinfo.tess union If the shader is not a tesselation shader, then writing to the tess member of the shaderinfo union will overwrite other members and crash. Closes: #2722 Fixes: `f1dd81ae10` ("nir: Collect if shader uses cross-invocation or indirect I/O.") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4408>	2020-04-01 20:25:55 +00:00
Ian Romanick	b097e326b8	nir/algebraic: Remove a redundant fabs pattern Made redundant by `5544b2cbbd` ("nir/algebraic: Use value range analysis to eliminate useless unary ops"). No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>	2020-04-01 00:28:38 +00:00
Ian Romanick	af1bc7e0c7	nir/algebraic: Use value range analysis to convert fmax to fsat This is conceptually similar to the 1-fsat(a) <=> fsat(1-a) rearragement done in: `3b74790941` ("nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))") 2d259713b7 ("nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions"). Note: this helps the Aztex Ruins shader that was hurt for spills and fills on Braodwell in the previous commit, but it does not fix the spills or fills. :( All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14528985 -> 14526116 (-0.02%) instructions in affected programs: 477300 -> 474431 (-0.60%) helped: 2332 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 1.23 x̃: 1 helped stats (rel) min: 0.07% max: 8.89% x̄: 0.88% x̃: 0.64% 95% mean confidence interval for instructions value: -1.27 -1.19 95% mean confidence interval for instructions %-change: -0.92% -0.85% Instructions are helped. total cycles in shared programs: 203723684 -> 203692984 (-0.02%) cycles in affected programs: 4878847 -> 4848147 (-0.63%) helped: 1764 HURT: 324 helped stats (abs) min: 1 max: 706 x̄: 22.94 x̃: 17 helped stats (rel) min: <.01% max: 17.75% x̄: 1.94% x̃: 1.66% HURT stats (abs) min: 1 max: 400 x̄: 30.15 x̃: 10 HURT stats (rel) min: <.01% max: 17.76% x̄: 1.91% x̃: 0.69% 95% mean confidence interval for cycles value: -16.55 -12.86 95% mean confidence interval for cycles %-change: -1.44% -1.24% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>	2020-04-01 00:28:38 +00:00
Ian Romanick	62795475e8	nir/algebraic: Distribute source modifiers into instructions There are three main classes of cases that are helped by this change: 1. When the negation is applied to a value being type converted (e.g., float(-x)). This could possibly also be handled with more clever code generation. 2. When the negation is applied to a phi node source (e.g., x = -(...); at the end of a basic block). This was the original case that caught my attention while looking at shader-db dumps. 3. When the negation is applied to the source of an instruction that cannot have source modifiers. This includes texture instructions and math box instructions on pre-Gen7 platforms (see more details below). In many these cases the negation can be propagated into the instructions that generate the value (e.g., -(ab) = (-a)b). In addition to the operations implemtned in this patch, I also tried: - frcp - Helped 6 or fewer shaders on Gen7+, and hurt just as many on pre-Gen7. On Gen6 and earlier, frcp is a math box instruction, and math box instructions cannot have source modifiers. I suspect this is why so many more shaders are helped on Gen6 than on Gen5 or Gen7. Gen6 supports OpenGL 3.3, so a lot more shaders compile on it. A lot of these shaders may have things like cos(-x) or rcp(-x) that could result in an explicit negation instruction. - bcsel - Hurt a few shaders with none helped. bcsel operates on integer sources, so the fabs or fneg cannot be a source modifier in the bcsel itself. - Integer instructions - No changes on any Intel platform. Some notes about the shader-db results below. - On Tiger Lake, a single Deus Ex fragment shader is hurt for both spills and fills. - On Haswell, a different Deus Ex fragment shader is hurt for both spills and fills. - On GM45, the "LOST: 1" and "GAINED: 1" is a single Left4Dead 2 (very high graphics settings, lol) fragment shader that upgrades from SIMD8 to SIMD16. v2: Add support for fsign. Add some patterns that remove redundant negations and redundant absolute value rather than trying to push them down the tree. Tiger Lake total instructions in shared programs: 17611333 -> 17586465 (-0.14%) instructions in affected programs: 3033734 -> 3008866 (-0.82%) helped: 10310 HURT: 632 helped stats (abs) min: 1 max: 35 x̄: 2.61 x̃: 1 helped stats (rel) min: 0.04% max: 16.67% x̄: 1.43% x̃: 1.01% HURT stats (abs) min: 1 max: 47 x̄: 3.21 x̃: 2 HURT stats (rel) min: 0.04% max: 5.08% x̄: 0.88% x̃: 0.63% 95% mean confidence interval for instructions value: -2.33 -2.21 95% mean confidence interval for instructions %-change: -1.32% -1.27% Instructions are helped. total cycles in shared programs: 338365223 -> 338262252 (-0.03%) cycles in affected programs: 125291811 -> 125188840 (-0.08%) helped: 5224 HURT: 2031 helped stats (abs) min: 1 max: 5670 x̄: 46.73 x̃: 12 helped stats (rel) min: <.01% max: 34.78% x̄: 1.91% x̃: 0.97% HURT stats (abs) min: 1 max: 2882 x̄: 69.50 x̃: 14 HURT stats (rel) min: <.01% max: 44.93% x̄: 2.35% x̃: 0.74% 95% mean confidence interval for cycles value: -18.71 -9.68 95% mean confidence interval for cycles %-change: -0.80% -0.63% Cycles are helped. total spills in shared programs: 8942 -> 8946 (0.04%) spills in affected programs: 8 -> 12 (50.00%) helped: 0 HURT: 1 total fills in shared programs: 9399 -> 9401 (0.02%) fills in affected programs: 21 -> 23 (9.52%) helped: 0 HURT: 1 Ice Lake total instructions in shared programs: 16124348 -> 16102258 (-0.14%) instructions in affected programs: 2830928 -> 2808838 (-0.78%) helped: 11294 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.96 x̃: 1 helped stats (rel) min: 0.07% max: 17.65% x̄: 1.32% x̃: 0.93% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.45% max: 4.00% x̄: 3.72% x̃: 3.72% 95% mean confidence interval for instructions value: -1.99 -1.93 95% mean confidence interval for instructions %-change: -1.34% -1.29% Instructions are helped. total cycles in shared programs: 335393932 -> 335325794 (-0.02%) cycles in affected programs: 123834609 -> 123766471 (-0.06%) helped: 5034 HURT: 2128 helped stats (abs) min: 1 max: 3256 x̄: 43.39 x̃: 11 helped stats (rel) min: <.01% max: 35.79% x̄: 1.98% x̃: 1.00% HURT stats (abs) min: 1 max: 2634 x̄: 70.63 x̃: 16 HURT stats (rel) min: <.01% max: 49.49% x̄: 2.73% x̃: 0.62% 95% mean confidence interval for cycles value: -13.66 -5.37 95% mean confidence interval for cycles %-change: -0.69% -0.48% Cycles are helped. LOST: 0 GAINED: 2 Skylake total instructions in shared programs: 14949240 -> 14927930 (-0.14%) instructions in affected programs: 2594756 -> 2573446 (-0.82%) helped: 11000 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.94 x̃: 1 helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76% 95% mean confidence interval for instructions value: -1.97 -1.91 95% mean confidence interval for instructions %-change: -1.42% -1.37% Instructions are helped. total cycles in shared programs: 324829346 -> 324821596 (<.01%) cycles in affected programs: 121566087 -> 121558337 (<.01%) helped: 4611 HURT: 2147 helped stats (abs) min: 1 max: 3715 x̄: 33.29 x̃: 10 helped stats (rel) min: <.01% max: 36.08% x̄: 1.94% x̃: 1.00% HURT stats (abs) min: 1 max: 2551 x̄: 67.88 x̃: 16 HURT stats (rel) min: <.01% max: 53.79% x̄: 3.69% x̃: 0.89% 95% mean confidence interval for cycles value: -4.25 1.96 95% mean confidence interval for cycles %-change: -0.28% -0.02% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 14971203 -> 14949957 (-0.14%) instructions in affected programs: 2635699 -> 2614453 (-0.81%) helped: 10982 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.93 x̃: 1 helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76% 95% mean confidence interval for instructions value: -1.97 -1.90 95% mean confidence interval for instructions %-change: -1.42% -1.37% Instructions are helped. total cycles in shared programs: 336215033 -> 336086458 (-0.04%) cycles in affected programs: 127383198 -> 127254623 (-0.10%) helped: 4884 HURT: 1963 helped stats (abs) min: 1 max: 25696 x̄: 51.78 x̃: 12 helped stats (rel) min: <.01% max: 58.28% x̄: 2.00% x̃: 1.05% HURT stats (abs) min: 1 max: 3401 x̄: 63.33 x̃: 16 HURT stats (rel) min: <.01% max: 39.95% x̄: 2.20% x̃: 0.70% 95% mean confidence interval for cycles value: -29.99 -7.57 95% mean confidence interval for cycles %-change: -0.89% -0.71% Cycles are helped. total fills in shared programs: 24905 -> 24901 (-0.02%) fills in affected programs: 117 -> 113 (-3.42%) helped: 4 HURT: 0 LOST: 0 GAINED: 16 Haswell total instructions in shared programs: 13148927 -> 13131528 (-0.13%) instructions in affected programs: 2220941 -> 2203542 (-0.78%) helped: 8017 HURT: 4 helped stats (abs) min: 1 max: 12 x̄: 2.17 x̃: 1 helped stats (rel) min: 0.07% max: 15.25% x̄: 1.40% x̃: 0.93% HURT stats (abs) min: 1 max: 7 x̄: 2.50 x̃: 1 HURT stats (rel) min: 0.33% max: 4.76% x̄: 2.73% x̃: 2.91% 95% mean confidence interval for instructions value: -2.21 -2.13 95% mean confidence interval for instructions %-change: -1.43% -1.37% Instructions are helped. total cycles in shared programs: 321221791 -> 321079870 (-0.04%) cycles in affected programs: 126886055 -> 126744134 (-0.11%) helped: 4674 HURT: 1729 helped stats (abs) min: 1 max: 23654 x̄: 56.47 x̃: 16 helped stats (rel) min: <.01% max: 53.22% x̄: 2.13% x̃: 1.05% HURT stats (abs) min: 1 max: 3694 x̄: 70.58 x̃: 18 HURT stats (rel) min: <.01% max: 63.06% x̄: 2.48% x̃: 0.90% 95% mean confidence interval for cycles value: -33.31 -11.02 95% mean confidence interval for cycles %-change: -0.99% -0.78% Cycles are helped. total spills in shared programs: 19872 -> 19874 (0.01%) spills in affected programs: 21 -> 23 (9.52%) helped: 0 HURT: 1 total fills in shared programs: 20941 -> 20941 (0.00%) fills in affected programs: 62 -> 62 (0.00%) helped: 1 HURT: 1 LOST: 0 GAINED: 8 Ivy Bridge total instructions in shared programs: 11875553 -> 11853839 (-0.18%) instructions in affected programs: 1553112 -> 1531398 (-1.40%) helped: 7304 HURT: 3 helped stats (abs) min: 1 max: 16 x̄: 2.97 x̃: 2 helped stats (rel) min: 0.07% max: 15.25% x̄: 1.62% x̃: 1.15% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.05% max: 3.33% x̄: 2.44% x̃: 2.94% 95% mean confidence interval for instructions value: -3.04 -2.90 95% mean confidence interval for instructions %-change: -1.65% -1.59% Instructions are helped. total cycles in shared programs: 178246425 -> 178184484 (-0.03%) cycles in affected programs: 13702146 -> 13640205 (-0.45%) helped: 4409 HURT: 1566 helped stats (abs) min: 1 max: 531 x̄: 24.52 x̃: 13 helped stats (rel) min: <.01% max: 38.67% x̄: 2.14% x̃: 1.02% HURT stats (abs) min: 1 max: 356 x̄: 29.48 x̃: 10 HURT stats (rel) min: <.01% max: 64.73% x̄: 1.87% x̃: 0.70% 95% mean confidence interval for cycles value: -11.60 -9.14 95% mean confidence interval for cycles %-change: -1.19% -0.99% Cycles are helped. LOST: 0 GAINED: 10 Sandy Bridge total instructions in shared programs: 10695740 -> 10667483 (-0.26%) instructions in affected programs: 2337607 -> 2309350 (-1.21%) helped: 10720 HURT: 1 helped stats (abs) min: 1 max: 49 x̄: 2.64 x̃: 2 helped stats (rel) min: 0.07% max: 20.00% x̄: 1.54% x̃: 1.13% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.04% max: 1.04% x̄: 1.04% x̃: 1.04% 95% mean confidence interval for instructions value: -2.69 -2.58 95% mean confidence interval for instructions %-change: -1.57% -1.51% Instructions are helped. total cycles in shared programs: 153478839 -> 153416223 (-0.04%) cycles in affected programs: 22050900 -> 21988284 (-0.28%) helped: 5342 HURT: 2200 helped stats (abs) min: 1 max: 1020 x̄: 20.34 x̃: 16 helped stats (rel) min: <.01% max: 24.05% x̄: 1.51% x̃: 0.86% HURT stats (abs) min: 1 max: 335 x̄: 20.93 x̃: 6 HURT stats (rel) min: <.01% max: 20.18% x̄: 1.03% x̃: 0.30% 95% mean confidence interval for cycles value: -9.18 -7.42 95% mean confidence interval for cycles %-change: -0.82% -0.71% Cycles are helped. Iron Lake total instructions in shared programs: 8114882 -> 8105574 (-0.11%) instructions in affected programs: 1232504 -> 1223196 (-0.76%) helped: 4109 HURT: 2 helped stats (abs) min: 1 max: 6 x̄: 2.27 x̃: 1 helped stats (rel) min: 0.05% max: 8.33% x̄: 0.99% x̃: 0.66% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.94% max: 4.35% x̄: 2.65% x̃: 2.65% 95% mean confidence interval for instructions value: -2.31 -2.21 95% mean confidence interval for instructions %-change: -1.01% -0.96% Instructions are helped. total cycles in shared programs: 188504036 -> 188466296 (-0.02%) cycles in affected programs: 31203798 -> 31166058 (-0.12%) helped: 3447 HURT: 36 helped stats (abs) min: 2 max: 92 x̄: 11.03 x̃: 8 helped stats (rel) min: <.01% max: 5.41% x̄: 0.21% x̃: 0.13% HURT stats (abs) min: 2 max: 30 x̄: 7.33 x̃: 6 HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.18% x̃: 0.10% 95% mean confidence interval for cycles value: -11.16 -10.51 95% mean confidence interval for cycles %-change: -0.22% -0.20% Cycles are helped. LOST: 0 GAINED: 1 GM45 total instructions in shared programs: 4989697 -> 4984531 (-0.10%) instructions in affected programs: 703952 -> 698786 (-0.73%) helped: 2493 HURT: 2 helped stats (abs) min: 1 max: 6 x̄: 2.07 x̃: 1 helped stats (rel) min: 0.05% max: 8.33% x̄: 1.03% x̃: 0.66% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.95% max: 4.35% x̄: 2.65% x̃: 2.65% 95% mean confidence interval for instructions value: -2.13 -2.01 95% mean confidence interval for instructions %-change: -1.07% -0.99% Instructions are helped. total cycles in shared programs: 128929136 -> 128903886 (-0.02%) cycles in affected programs: 21583096 -> 21557846 (-0.12%) helped: 2214 HURT: 17 helped stats (abs) min: 2 max: 92 x̄: 11.44 x̃: 8 helped stats (rel) min: <.01% max: 5.41% x̄: 0.24% x̃: 0.13% HURT stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4 HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.20% x̃: 0.09% 95% mean confidence interval for cycles value: -11.75 -10.88 95% mean confidence interval for cycles %-change: -0.25% -0.22% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>	2020-04-01 00:28:38 +00:00
Ian Romanick	c0bdf37c91	nir/algebraic: Change the default cursor location when replacing a unary op If the expression tree that is being replaced has a unary operation at its root, set the cursor (location where new instructions are inserted) at the source instruction instead. This doesn't do much now because there are very few patterns that have a unary operation as the root. Almost all of the patterns that do have a unary operation as the root have inot. All of the shaders that are affected by this commit have expression trees with an inot at the root. This change prevents some significant, spurious caused by the next commit. There is further explanation in the large comment added in the code. I also considered a couple other options that may still be worth exploring. 1. Add some mark-up to the search pattern to denote where new instructions should be added. I considered using "@" to denote the cursor location. For example, (('fneg', ('fadd@', a, b)), ...) 2. To prevent other kinds of unintended code motion, add the ability to name expressions in the search pattern so that they can be reused in the replacement. For example, (('bcsel', ('ige', ('find_lsb=b', a), 0), ('find_lsb', a), -1), b), An alternative would be to add some kind of CSE at the time of inserting the replacements. Create a new instruction, then check to see if it already exists. That option might be better overall. Over the years I know Matt has heard me complain, "I added a pattern that just deleted an instruction, but it added a bunch of spills!" This was always in large, complex shaders that are very hard to analyze. I always blamed these cases on the scheduler being dumb. I am now very suspicious that unintended code motion was the real problem. All Gen4+ Intel platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 17611405 -> 17611333 (<.01%) instructions in affected programs: 18613 -> 18541 (-0.39%) helped: 41 HURT: 13 helped stats (abs) min: 1 max: 18 x̄: 4.46 x̃: 4 helped stats (rel) min: 0.27% max: 5.68% x̄: 1.29% x̃: 1.34% HURT stats (abs) min: 1 max: 20 x̄: 8.54 x̃: 7 HURT stats (rel) min: 0.30% max: 4.20% x̄: 2.15% x̃: 2.38% 95% mean confidence interval for instructions value: -3.29 0.63 95% mean confidence interval for instructions %-change: -0.95% 0.02% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 338366118 -> 338365223 (<.01%) cycles in affected programs: 257889 -> 256994 (-0.35%) helped: 42 HURT: 15 helped stats (abs) min: 2 max: 120 x̄: 39.38 x̃: 34 helped stats (rel) min: 0.04% max: 2.55% x̄: 0.86% x̃: 0.76% HURT stats (abs) min: 6 max: 204 x̄: 50.60 x̃: 34 HURT stats (rel) min: 0.11% max: 4.75% x̄: 1.12% x̃: 0.56% 95% mean confidence interval for cycles value: -30.39 -1.02 95% mean confidence interval for cycles %-change: -0.66% -0.02% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>	2020-04-01 00:28:38 +00:00
Timothy Arceri	0f4a81430e	nir: fix crash in varying packing on interface mismatch For example when the outputs are scalars but the inputs are struct members. Fixes: `26aa460940` ("nir: rewrite varying component packing") Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4351> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4351>	2020-03-31 23:43:31 +00:00
Jason Ekstrand	9468f0729b	nir: Handle vec8/16 in nir_shrink_array_vars Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	c26bf848ba	nir: Handle vec8/16 in opt_undef_vecN Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	99540edfde	nir: Treat vec8/16 as select in opt_peephole_select Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	e3554a293b	nir: Handle vec8/16 in opt_split_alu_of_phi Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	2aab7999e4	nir: Handle vec8/16 in lower_regs_to_ssa Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	1033255952	nir: Handle vec8/16 in lower_phis_to_scalar Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	ac7a940eba	nir: Handle vec8/16 in gather_ssa_types Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	a18c4ee7b0	nir: Handle vec8/16 in bool_to_bitsize Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	f5bbdf7621	nir: Copy propagate through vec8s and vec16s Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	842338e2f0	nir: Add a nir_op_is_vec helper Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	84ab61160a	nir/algebraic: Add downcast-of-pack opts Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	14a49f31d3	nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64 We have the code to do the lowering, we were just missing the boilerplate bits to make should_lower_int64_alu_instr return true. Fixes: `62d55f1281` "nir: Wire up int64 lowering functions" Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>	2020-03-31 00:18:05 +00:00
Jason Ekstrand	b113170559	nir/opt_loop_unroll: Fix has_nested_loop handling In `87839680c0`, a very subtle mistake was made with the CFG walking recursion. Instead of setting the local has_nested_loop variable when process child loops, has_nested_loop_out was passed directly into the process_loop_in_block call. This broke nested loop detection heuristics and caused loop unrolling to run massively out of control. In particular, it makes the following CTS test compile virtually forever: dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom Fixes: `87839680c0` "nir: Fix breakage of foreach_list_typed_safe..." Closes: #2710 Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>	2020-03-30 22:20:47 +00:00
Jason Ekstrand	f5b14d983e	nir: Set UBO alignments in lower_uniforms_to_ubo Fixes: `fb64954d9d` "nir: Validate that memory load/store ops work on..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378>	2020-03-30 19:18:17 +00:00
Jason Ekstrand	fb64954d9d	nir: Validate that memory load/store ops work on whole bytes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>	2020-03-30 15:46:19 +00:00
Jason Ekstrand	c217ee8d35	nir: Insert b2b1s around booleans in nir_lower_to By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics generated by nir_lower_io, we can ensure that the intrinsic has the correct destination bit size. Not having the right size can mess up passes which try to optimize access. In particular, it was causing brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant that booleans uniforms weren't getting pushed as push constants. I don't think this is an actual functional bug anywhere hence no CC to stable but it may improve perf somewhere. Shader-db results on ICL with iris: total instructions in shared programs: 16076707 -> 16075246 (<.01%) instructions in affected programs: 129034 -> 127573 (-1.13%) helped: 487 HURT: 0 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36% 95% mean confidence interval for instructions value: -3.00 -3.00 95% mean confidence interval for instructions %-change: -1.37% -1.29% Instructions are helped. total cycles in shared programs: 338015639 -> 337983311 (<.01%) cycles in affected programs: 971986 -> 939658 (-3.33%) helped: 362 HURT: 110 helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43 helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60% HURT stats (abs) min: 1 max: 554 x̄: 26.55 x̃: 18 HURT stats (rel) min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96% 95% mean confidence interval for cycles value: -79.97 -57.01 95% mean confidence interval for cycles %-change: -4.60% -3.47% Cycles are helped. total sends in shared programs: 815037 -> 814550 (-0.06%) sends in affected programs: 5701 -> 5214 (-8.54%) helped: 487 HURT: 0 LOST: 2 GAINED: 0 The two lost programs were SIMD16 shaders in CS:GO. However, CS:GO was also one of the most helped programs where it shaves sends off of 134 programs. This seems to reduce GPU core clocks by about 4% on the first 1000 frames of the PTS benchmark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>	2020-03-30 15:46:19 +00:00
Jason Ekstrand	d2dfcee7f7	nir: Use b2b opcodes for shared and constant memory No shader-db changes on ICL with iris Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>	2020-03-30 15:46:19 +00:00
Jason Ekstrand	b2db84153a	nir: Add b2b opcodes These exist to convert between different types of boolean values. In particular, we want to use these for uniform and shared memory operations where we need to convert to a reasonably sized boolean but we don't care what its format is so we don't want to make the back-end insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the format of the uniform boolean to whatever the driver wants. In the case of shared, every value in a shared variable comes from the shader so it's already in the right boolean format. The new boolean conversion opcodes get replaced with mov in lower_bool_to_int/float32 so the back-end will hopefully never see them. However, while we're in the middle of optimizing our NIR, they let us have sensible load_uniform/ubo intrinsics and also have the bit size conversion. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>	2020-03-30 15:46:19 +00:00
Samuel Pitoiset	3935a729d9	nir/algebraic: add fexp2(fmul(flog2(a), 0.5) -> fsqrt(a) optimization Helps some Wolfenstein II and Wolfenstein Youngblood shaders. pipeline-db (VEGA10/ACO): Totals from affected shaders: SGPRS: 17904 -> 17904 (0.00 %) VGPRS: 14492 -> 14492 (0.00 %) Spilled SGPRs: 20 -> 20 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 1753152 -> 1749708 (-0.20 %) bytes Max Waves: 2581 -> 2581 (0.00 %) pipeline-db (VEGA10/LLVM): Totals from affected shaders: SGPRS: 26656 -> 26656 (0.00 %) VGPRS: 23780 -> 23780 (0.00 %) Spilled SGPRs: 2112 -> 2112 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 2552712 -> 2549236 (-0.14 %) bytes Max Waves: 3359 -> 3359 (0.00 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353>	2020-03-30 14:07:43 +00:00
Timur Kristóf	f1dd81ae10	nir: Collect if shader uses cross-invocation or indirect I/O. The following new fields are added to tess shader info: * `tcs_cross_invocation_inputs_read` * `tcs_cross_invocation_outputs_read` These are I/O masks that are a subset of inputs_read and outputs_read and they contain which per-vertex inputs and outputs are read cross-invocation. Additionall, the following new fields are added to shader_info: * `inputs_read_indirectly` * `outputs_accessed_indirectly` * `patch_inputs_read_indirectly` * `patch_outputs_accessed_indirectly` These new fields can be used for optimizing TCS in a back-end compiler. If you can be sure that the TCS doesn't use cross-invocation inputs or outputs, you can choose a different strategy for storing VS and TCS outputs. However, such optimizations might need to be disabled when the inputs/outputs are accessed indirectly due to backend limitations, so this information is also collected. Example: RADV currently has to store all VS and TCS outputs in LDS, but for shaders when only inputs and/or outputs belonging to the current invocation ID are used, it could skip storing these in LDS entirely. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>	2020-03-30 13:09:08 +00:00
Danylo Piliaiev	87839680c0	nir: Fix breakage of foreach_list_typed_safe assumptions in loop unrolling foreach_list_typed_safe works with assumption that even if current node becomes invalid, the next will be still valid. However process_loops broke this assumption, because during iteration when immediate child is unrolled - not only current node could be removed but also the one after it. This doesn't cause issues now but it will cause issues when undefined behaviour in foreach* macros is fixed. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189>	2020-03-30 14:41:30 +03:00

1 2 3 4 5 ...

2176 commits