fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 15:50:11 +01:00

Author	SHA1	Message	Date
Juan A. Suarez Romero	460de2159e	intel/compiler: store the FS inputs in WM prog data Store the fragment shader inputs in the program data so we can use them later when required without needing the NIR shader. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2010>	2020-04-01 23:36:28 +00:00
Ian Romanick	ba88e95187	intel/fs: Fix NULL destinations on 3-source instructions again after late DCE We considered moving this down near the call to insert_gen4_send_dependency_workarounds. By that point it's too late for a couple reasons. One, we're potentially increasing resiter pressure that may lead to anoter spill. Two, fixup_3src_null_dest tries to allocate a VGRF, but the post-register allocation shader uses physical registers. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2621 Fixes: `ba2fa1ceaf` ("intel/fs: Do cmod prop again after scheduling") Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4155> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4155>	2020-03-12 08:22:43 -07:00
Mathias Fröhlich	630154e77b	i965: Move down genX_upload_sbe in profiles. Avoid looping over all VARYING_SLOT_MAX urb_setup array entries from genX_upload_sbe. Prepare an array indirection to the active entries of urb_setup already in the compile step. On upload only walk the active arrays. v2: Use uint8_t to store the attribute numbers. v3: Change loop to build up the array indirection. v4: Rebase. v5: Style fix. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>	2020-03-10 14:28:36 +00:00
Ian Romanick	ba2fa1ceaf	intel/fs: Do cmod prop again after scheduling Pre-RA scheduling can create more opportunities for CMOD propagation. This takes advantage of that. It may be worth doing this again in post-RA scheduling, but there are additional problems there. I'm a little torn about the use of the OPT() macro. On the one hand, it would be confusing to see dumps from INTEL_DEBUG=optimizer that don't match the final output. On the other hand, since register allocation can fail, the same pass can be run multiple times. Each time one or both passes might or might not make progress. This would also lead to incongruous, confusing output. Ice Lake total instructions in shared programs: 14549808 -> 14548529 (<.01%) instructions in affected programs: 231985 -> 230706 (-0.55%) helped: 632 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 2.02 x̃: 1 helped stats (rel) min: 0.05% max: 2.56% x̄: 0.57% x̃: 0.41% 95% mean confidence interval for instructions value: -2.25 -1.79 95% mean confidence interval for instructions %-change: -0.61% -0.54% Instructions are helped. total cycles in shared programs: 203770850 -> 203776599 (<.01%) cycles in affected programs: 2495653 -> 2501402 (0.23%) helped: 282 HURT: 197 helped stats (abs) min: 1 max: 242 x̄: 20.37 x̃: 16 helped stats (rel) min: <.01% max: 11.65% x̄: 0.91% x̃: 0.64% HURT stats (abs) min: 2 max: 609 x̄: 58.35 x̃: 20 HURT stats (rel) min: <.01% max: 10.97% x̄: 1.35% x̃: 0.66% 95% mean confidence interval for cycles value: 5.27 18.73 95% mean confidence interval for cycles %-change: -0.16% 0.21% Inconclusive result (%-change mean confidence interval includes 0). LOST: 0 GAINED: 2 Skylake total instructions in shared programs: 13447708 -> 13446594 (<.01%) instructions in affected programs: 216813 -> 215699 (-0.51%) helped: 623 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 1.79 x̃: 1 helped stats (rel) min: 0.06% max: 2.86% x̄: 0.59% x̃: 0.42% 95% mean confidence interval for instructions value: -1.99 -1.59 95% mean confidence interval for instructions %-change: -0.63% -0.55% Instructions are helped. total cycles in shared programs: 193759224 -> 193762726 (<.01%) cycles in affected programs: 2540035 -> 2543537 (0.14%) helped: 249 HURT: 190 helped stats (abs) min: 2 max: 196 x̄: 16.67 x̃: 14 helped stats (rel) min: <.01% max: 4.71% x̄: 0.66% x̃: 0.62% HURT stats (abs) min: 2 max: 614 x̄: 40.27 x̃: 14 HURT stats (rel) min: 0.02% max: 5.78% x̄: 0.86% x̃: 0.37% 95% mean confidence interval for cycles value: 2.57 13.39 95% mean confidence interval for cycles %-change: -0.11% 0.11% Inconclusive result (%-change mean confidence interval includes 0). LOST: 0 GAINED: 1 Broadwell total instructions in shared programs: 13418631 -> 13417393 (<.01%) instructions in affected programs: 243192 -> 241954 (-0.51%) helped: 694 HURT: 0 helped stats (abs) min: 1 max: 31 x̄: 1.78 x̃: 1 helped stats (rel) min: 0.06% max: 2.86% x̄: 0.59% x̃: 0.44% 95% mean confidence interval for instructions value: -1.95 -1.62 95% mean confidence interval for instructions %-change: -0.62% -0.55% Instructions are helped. total cycles in shared programs: 200822940 -> 200829128 (<.01%) cycles in affected programs: 2128651 -> 2134839 (0.29%) helped: 251 HURT: 226 helped stats (abs) min: 1 max: 200 x̄: 14.32 x̃: 12 helped stats (rel) min: <.01% max: 3.56% x̄: 0.60% x̃: 0.50% HURT stats (abs) min: 2 max: 611 x̄: 43.28 x̃: 18 HURT stats (rel) min: 0.02% max: 7.03% x̄: 0.93% x̃: 0.54% 95% mean confidence interval for cycles value: 7.44 18.50 95% mean confidence interval for cycles %-change: 0.02% 0.23% Cycles are HURT. Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 11569710 -> 11568829 (<.01%) instructions in affected programs: 147862 -> 146981 (-0.60%) helped: 487 HURT: 0 helped stats (abs) min: 1 max: 34 x̄: 1.81 x̃: 1 helped stats (rel) min: 0.12% max: 4.75% x̄: 0.57% x̃: 0.45% 95% mean confidence interval for instructions value: -2.03 -1.59 95% mean confidence interval for instructions %-change: -0.61% -0.54% Instructions are helped. total cycles in shared programs: 187079425 -> 187079437 (<.01%) cycles in affected programs: 1088494 -> 1088506 (<.01%) helped: 234 HURT: 124 helped stats (abs) min: 2 max: 282 x̄: 22.66 x̃: 16 helped stats (rel) min: 0.03% max: 7.88% x̄: 0.93% x̃: 0.75% HURT stats (abs) min: 1 max: 276 x̄: 42.86 x̃: 20 HURT stats (rel) min: 0.03% max: 6.70% x̄: 0.99% x̃: 0.53% 95% mean confidence interval for cycles value: -5.54 5.61 95% mean confidence interval for cycles %-change: -0.41% -0.11% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 7746 -> 7740 (-0.08%) spills in affected programs: 6 -> 0 helped: 1 HURT: 0 total fills in shared programs: 6264 -> 6258 (-0.10%) fills in affected programs: 6 -> 0 helped: 1 HURT: 0 Sandy Bridge total instructions in shared programs: 10688576 -> 10688177 (<.01%) instructions in affected programs: 137875 -> 137476 (-0.29%) helped: 358 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 1.11 x̃: 1 helped stats (rel) min: 0.15% max: 1.43% x̄: 0.35% x̃: 0.28% 95% mean confidence interval for instructions value: -1.18 -1.05 95% mean confidence interval for instructions %-change: -0.37% -0.32% Instructions are helped. total cycles in shared programs: 153397144 -> 153393046 (<.01%) cycles in affected programs: 1220713 -> 1216615 (-0.34%) helped: 255 HURT: 31 helped stats (abs) min: 1 max: 304 x̄: 16.71 x̃: 16 helped stats (rel) min: <.01% max: 6.70% x̄: 0.41% x̃: 0.31% HURT stats (abs) min: 1 max: 41 x̄: 5.29 x̃: 3 HURT stats (rel) min: 0.02% max: 0.65% x̄: 0.16% x̃: 0.11% 95% mean confidence interval for cycles value: -17.44 -11.22 95% mean confidence interval for cycles %-change: -0.40% -0.29% Cycles are helped. Iron Lake total instructions in shared programs: 8106894 -> 8105529 (-0.02%) instructions in affected programs: 287197 -> 285832 (-0.48%) helped: 1099 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 1.24 x̃: 1 helped stats (rel) min: 0.16% max: 4.55% x̄: 0.67% x̃: 0.61% 95% mean confidence interval for instructions value: -1.29 -1.19 95% mean confidence interval for instructions %-change: -0.70% -0.64% Instructions are helped. total cycles in shared programs: 188347022 -> 188344266 (<.01%) cycles in affected programs: 3740632 -> 3737876 (-0.07%) helped: 758 HURT: 10 helped stats (abs) min: 2 max: 38 x̄: 3.68 x̃: 2 helped stats (rel) min: <.01% max: 1.00% x̄: 0.12% x̃: 0.08% HURT stats (abs) min: 2 max: 4 x̄: 3.20 x̃: 4 HURT stats (rel) min: 0.03% max: 0.07% x̄: 0.06% x̃: 0.07% 95% mean confidence interval for cycles value: -3.82 -3.35 95% mean confidence interval for cycles %-change: -0.13% -0.11% Cycles are helped. GM45 total instructions in shared programs: 4985449 -> 4984768 (-0.01%) instructions in affected programs: 145154 -> 144473 (-0.47%) helped: 547 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 1.24 x̃: 1 helped stats (rel) min: 0.16% max: 2.86% x̄: 0.66% x̃: 0.61% 95% mean confidence interval for instructions value: -1.31 -1.18 95% mean confidence interval for instructions %-change: -0.69% -0.62% Instructions are helped. total cycles in shared programs: 128835062 -> 128833144 (<.01%) cycles in affected programs: 2720650 -> 2718732 (-0.07%) helped: 517 HURT: 1 helped stats (abs) min: 2 max: 38 x̄: 3.71 x̃: 2 helped stats (rel) min: <.01% max: 0.89% x̄: 0.11% x̃: 0.07% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.04% x̄: 0.04% x̃: 0.04% 95% mean confidence interval for cycles value: -4.02 -3.39 95% mean confidence interval for cycles %-change: -0.12% -0.10% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3965>	2020-03-09 16:46:19 -07:00
Matt Turner	bb3e7b0fe3	intel/compiler: Pass shader_stats for each SIMD mode Passing shader_stats to the fs_generator constructor means that the SIMD8 shader stats from the visitor (such as the scheduler mode) will be reported out for the SIMD16/SIMD32 versions as well. As you can see, we are now passing 'shader_stats' and 'stats' to generate_code(), which is obviously odd looking. Ian rebased and committed an old patch of mine which added the shader_stats struct on July 30 in commit `dabb5d4bee` (i965/fs: Add a shader_stats struct.) and shortly after on August 12 Jason added the brw_compile_stats struct in commit `134607760a` (intel/compiler: Fill a compiler statistics struct). I'd like to combine the two, but I'm not sure how. shader_stats is an input to generate_code() while brw_compile_stats is an output and is only used by the Vulkan driver. Leave it as is for now... Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>	2020-03-09 04:44:12 +00:00
Matt Turner	75a33e268e	intel/compiler: Mark some methods and parameters const Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>	2020-03-09 04:44:11 +00:00
Francisco Jerez	70349a2252	intel/compiler: Calculate num_instructions in O(1) during register pressure calculation And mark the variable declaration as const. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:21:13 -08:00
Francisco Jerez	e5e4d016b9	intel/compiler: Move register pressure calculation into IR analysis object This defines a new BRW_ANALYSIS object which wraps the register pressure computation code along with its result. For the rationale see the previous commits converting the liveness and dominance analysis passes to the IR analysis framework. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:21:10 -08:00
Francisco Jerez	ea44de6d8c	intel/compiler/fs: Switch liveness analysis to IR analysis framework This involves wrapping fs_live_variables in a BRW_ANALYSIS object and hooking it up to invalidate_analysis() so it's properly invalidated. Seems like a lot of churn but it's fairly straightforward. The fs_visitor invalidate_ and calculate_live_intervals() methods are no longer necessary after this change. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:57 -08:00
Francisco Jerez	ba73e606f6	intel/compiler: Move all live interval analysis results into fs_live_variables This moves the following methods that are currently defined in fs_visitor (even though they are side products of the liveness analysis computation) and are already implemented in brw_fs_live_variables.cpp: > bool virtual_grf_interferes(int a, int b) const; > int virtual_grf_start; > int virtual_grf_end; It makes sense for them to be part of the fs_live_variables object, because they have the same lifetime as other liveness analysis results and because this will allow some extra validation to happen wherever they are accessed in order to make sure that we only ever use up-to-date liveness analysis results. This shortens the virtual_grf prefix in order to compensate for the slightly increased lexical overhead from the live_intervals pointer dereference. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:43 -08:00
Francisco Jerez	ab6d792986	intel/compiler: Pass detailed dependency classes to invalidate_analysis() Have fun reading through the whole back-end optimizer to verify whether I've missed any dependency flags -- Or alternatively, just trust that any mistake here will trigger an assertion failure during analysis pass validation if it ever poses a problem for the consistency of any of the analysis passes managed by the framework. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:39 -08:00
Francisco Jerez	d966a6b4c4	intel/compiler: Introduce backend_shader method to propagate IR changes to analysis passes The invalidate_analysis() method knows what analysis passes there are in the back-end and calls their invalidate() method to report changes in the IR. For the moment it just calls invalidate_live_intervals() (which will eventually be fully replaced by this function) if anything changed. This makes all optimization passes invalidate DEPENDENCY_EVERYTHING, which is clearly far from ideal -- The dependency classes passed to invalidate_analysis() will be refined in a future commit. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>	2020-03-06 10:20:32 -08:00
Jordan Justen	cf12faef61	intel/compiler: Restrict cs_threads to 64 Our current GPGPU_WALKER code only supports up to 64 threads. On HSW we could use up to 70 and TGL up to 112, but only if the walker is adjusted so the width does not exceed 64. Work to support this is in progress. Previous to this change, we might try to downgrade to SIMD8 if the SIMD16 shader spilled. Since HSW and TGL have the max number of threads above 64, we would then try to emit an invalid GPGPU walker command. Fixes: `932045061b` ("i965/cs: Emit compute shader code and upload programs") Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2020-02-28 14:45:43 -08:00
Danylo Piliaiev	d596795d4d	brw_fs: Avoid zero size vla ../src/intel/compiler/brw_fs.cpp:2247:46: runtime error: variable length array bound evaluates to non-positive value 0 #0 0x7f78f5697678 in fs_visitor::assign_constant_locations() ../src/intel/compiler/brw_fs.cpp:2247 #1 0x7f78f571d29e in fs_visitor::optimize() ../src/intel/compiler/brw_fs.cpp:7361 #2 0x7f78f574eb84 in fs_visitor::run_fs(bool, bool) ../src/intel/compiler/brw_fs.cpp:8022 #3 0x7f78f575641b in brw_compile_fs ../src/intel/compiler/brw_fs.cpp:8408 #4 0x7f78f255c8e4 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:123 #5 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608 #6 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56 #7 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381 #8 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119 #9 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133 #10 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163 #11 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134 #12 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360 #13 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394 #14 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169 #15 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>	2020-02-19 12:07:24 +02:00
Francisco Jerez	8d3b86e34a	intel/fs/gen7+: Implement discard/demote for SIMD32 programs. At this point this simply involves fixing the initialization of the sample mask flag register to take the right dispatch mask from the thread payload, and fixing sample_mask_reg() to return f1.1 for the second half of a SIMD32 thread. This improves Manhattan 3.1 performance by 2.4%±0.31% (N>40) on my ICL with SIMD32 enabled relative to falling back to SIMD16 for the shaders that use discard. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-02-14 14:31:49 -08:00
Francisco Jerez	04c7d3d4b1	intel/fs: Return consistent UW types from sample_mask_reg() in fragment shaders. In SIMD32 programs that don't use discard, the upper 16 bits of the UD result of sample_mask_reg() don't contain the sample mask of the upper 16 channels as one would expect. Stop pretending we are returning a valid 32-bit mask. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-02-14 14:31:49 -08:00
Francisco Jerez	1c6853a9be	intel/fs: Refactor predication on sample mask into helper function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-02-14 14:31:48 -08:00
Francisco Jerez	a792e11f5c	intel/fs/gen7+: Swap sample mask flag register and FIND_LIVE_CHANNEL temporary. FIND_LIVE_CHANNEL was using f1.0-f1.1 as temporary flag register on Gen7, instead use f0.0-f0.1. In order to avoid collision with the discard sample mask, move the latter to f1.0-f1.1. This makes room for keeping track of the sample mask of the second half of SIMD32 programs that use discard. Note that some MOVs of the sample mask into f1.0 become redundant now in lower_surface_logical_send() and lower_a64_logical_send(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>x	2020-02-14 14:31:48 -08:00
Francisco Jerez	083fd96a97	intel/fs: Use helper for discard sample mask flag subregister number. Use it instead of hard-coding f0.1 for the sample mask of programs that use discard. This will make the task easier when we replace f0.1 with another flag register location in order to support discard with SIMD32 shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-02-14 14:31:48 -08:00
Francisco Jerez	a6bc11a789	intel/fs: Make sample_mask_reg() local to brw_fs.cpp and use it in more places. It's only really useful there. This will avoid confusion with another helper with a similar purpose I'm about to introduce that will be useful in multiple files from the FS back-end. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-02-14 14:31:48 -08:00
Francisco Jerez	57dee58c82	intel/fs: Set src0 alpha present bit in header when provided in message payload. Currently the "Source0 Alpha Present to RenderTarget" bit of the RT write message header is derived from brw_wm_prog_data::replicate_alpha. However the src0_alpha payload is provided anytime it's specified to the logical message. This could theoretically lead to an inconsistency if somebody provided a src0_alpha value while brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in a future commit in order to implement a hardware workaround. Instead calculate the header bit based on whether a src0_alpha value was provided to the logical message, which guarantees the same behavior on pre-ICL and ICL+ (the latter used an extended descriptor bit for this which didn't suffer from the same issue). Remove the brw_wm_prog_data::replicate_alpha flag. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-02-14 14:31:48 -08:00
Francisco Jerez	a8ac0bd759	intel/fs/gen12: Workaround unwanted SEND execution due to broken NoMask control flow. This is a less invasive alternative to the workaround documented in the hardware spec for GEN:BUG:1407528679, which doesn't involve disabling structured control flow (it's unlikely that switching to GOTO/JOIN would have actually fixed the problem anyway). Under some conditions Gen12 hardware can end up executing a BB with all channels disabled, which will lead to the execution of any NoMask instructions in it, even though any execution-masked instructions will be correctly shot down. This may break assumptions of some NoMask SEND messages whose descriptor depends on data generated by live invocations of the shader. This avoids the problem by predicating certain instructions on an ANY horizontal predicate that makes sure that their execution is omitted when all channels of the program are disabled. The shader-db impact of this patch seems to be minimal: total instructions in shared programs: 17169833 -> 17169913 (0.00%) instructions in affected programs: 30663 -> 30743 (0.26%) helped: 0 HURT: 42 total cycles in shared programs: 336966176 -> 336968568 (0.00%) cycles in affected programs: 2367290 -> 2369682 (0.10%) helped: 0 HURT: 13 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: 20.0 <mesa-stable@lists.freedesktop.org>	2020-02-14 14:31:48 -08:00
Francisco Jerez	008f95a043	intel/fs: Add virtual instruction to load mask of live channels into flag register. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: 20.0 <mesa-stable@lists.freedesktop.org>	2020-02-14 14:31:48 -08:00
Francisco Jerez	b8b509fb92	intel/fs/gen7: Fix fs_inst::flags_written() for SHADER_OPCODE_FIND_LIVE_CHANNEL. We need to pass a width of 32 since the opcode bashes the whole f1.0 register on IVB. This is unlikely to have caused problems since f1.0 is largely unused currently. That's likely to change soon though, even on platforms other than Gen7. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: 20.0 <mesa-stable@lists.freedesktop.org>	2020-02-14 14:31:48 -08:00
Ian Romanick	58907568ec	intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops v2: Add a big comment explaining the [IU]SUB_SAT lowering. Suggested by Caio. v3: Use get_fpu_lowered_simd_width in get_lowered_simd_width. Suggested by Ken on IRC. v4: Fix a typo in a comment. Noticed by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>	2020-01-23 00:18:57 +00:00
Ian Romanick	74cd0964d6	intel/fs: Don't lower integer multiplies that don't need lowering v2: Move the check to fs_visitor::lower_integer_multiplication. Previously the cases where lowering was skipped, the original instruction was removed by fs_visitor::lower_integer_multiplication. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>	2020-01-23 00:18:57 +00:00
Matt Turner	49c21802cb	intel/compiler: Split has_64bit_types into float/int Gen7 has 64-bit floats but not 64-bit ints. Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Caio Marcelo de Oliveira Filho	ff5b74ef32	intel/fs: Add workgroup_size() helper Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>	2020-01-21 23:41:35 +00:00
Francisco Jerez	b54b67e067	intel/fs: Switch to standard vector layout for barycentrics at optimization time. This involves permuting the registers of barycentric vectors to have the standard X[0-n] Y[0-n] layout at NIR translation time. Barycentrics are converted to the format expected by the PLN instruction in the lower_barycentrics() pass run after the optimization loop. Main reason is correctness of SIMD32 fragment shaders. The shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used during NIR translation are busted for SIMD32. This leads to serious corruption at present with INTEL_DEBUG=do32, especially on Gen11+ where these helpers are hit more frequently due to the lack of a hardware PLN instruction. Of course one could have chosen to fix those helpers instead, but there is another far more subtle issue that was reported during review of the SIMD32 fragment shader codegen changes: The SIMD splitting pass currently handles SIMD32 barycentric vectors as if they had the standard X[0-n] Y[0-n] layout, even though they are interleaved for the PLN instruction, which causes incorrect execution masks to be applied to the MOVs unzipping barycentric vectors in cases where a LINTERP instruction occurs under non-uniform control flow. I'm not aware of any conformance regressions due to the latter issue at present, but for our peace of mind let's move the conversion to the PLN layout into the lower_barycentrics() pass run after lower_simd_width(). This leads to the following shader-db improvements (including SIMD32 shaders) in combination with the previous back-end preparation changes -- Without them (especially the copy propagation changes) this would lead to a massive number of regressions. On ICL: total instructions in shared programs: 20662316 -> 20466903 (-0.95%) instructions in affected programs: 10538474 -> 10343061 (-1.85%) helped: 68775 HURT: 6 total spills in shared programs: 8938 -> 8748 (-2.13%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8965 -> 8663 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 146 GAINED: 43 On SKL: total instructions in shared programs: 18725867 -> 18614912 (-0.59%) instructions in affected programs: 3876590 -> 3765635 (-2.86%) helped: 27492 HURT: 2 LOST: 191 GAINED: 417 On SNB: total instructions in shared programs: 14573613 -> 13980646 (-4.07%) instructions in affected programs: 5199074 -> 4606107 (-11.41%) helped: 29998 HURT: 0 LOST: 21 GAINED: 30 Results are somewhat less impressive but still significant without SIMD32 fragment shaders enabled. On ICL: total instructions in shared programs: 16148728 -> 16061659 (-0.54%) instructions in affected programs: 6114788 -> 6027719 (-1.42%) helped: 42046 HURT: 6 total spills in shared programs: 8218 -> 8028 (-2.31%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8953 -> 8651 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 0 GAINED: 3 On SKL: total instructions in shared programs: 14927994 -> 14926738 (-0.01%) instructions in affected programs: 168850 -> 167594 (-0.74%) helped: 711 HURT: 2 On SNB: total instructions in shared programs: 10770538 -> 10734403 (-0.34%) instructions in affected programs: 2702172 -> 2666037 (-1.34%) helped: 17818 HURT: 0 All of the hurt shaders are either spilling slightly more or emitting additional NOP instructions due to the SIMD16 POW workaround for Gen8-9 combined with differences in scheduling. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:23:12 -08:00
Francisco Jerez	79bd252d6e	intel/fs: Introduce barycentric layout lowering pass. The goal is to represent barycentrics with the standard vector layout during optimization and particularly SIMD lowering. Instead of emitting the barycentric layout conversions at NIR translation time, do it later as a lowering pass. For the moment this is only applied to PI messages, but we'll give the same treatment to LINTERP instructions too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:59 -08:00
Francisco Jerez	ab0d1b3b3d	intel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers. This reworks the current fs_inst::is_copy_payload() method into a number of classification helpers with well-defined semantics. This will be useful later on in order to optimize LOAD_PAYLOAD instructions more aggressively in cases where we can determine it's safe to do so. The closest equivalent of the present fs_inst::is_copy_payload() method is the is_coalescing_payload() helper introduced here. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:19 -08:00
Francisco Jerez	1873202f44	intel/fs: Generalize fs_reg::is_contiguous() to register files other than VGRF. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:20:59 -08:00
Francisco Jerez	d9a57c85cc	intel/fs: Try to vectorize header setup in lower_load_payload(). In cases where LOAD_PAYLOAD is provided a pair of contiguous registers as header sources, try to use a single SIMD16 instruction in order to initialize them. This is unlikely to affect the overall cycle count of the shader, since the compressed instruction has twice the issue time, except due to the reduced pressure on the instruction cache. Main motivation is avoiding instruction-count regressions in combination with the following copy propagation improvements, which will allow the SIMD16 g0-1 header setup emitted for framebuffer writes to be copy-propagated into its LOAD_PAYLOAD, leading to the emission of two SIMD8 MOV instructions instead of a single SIMD16 MOV. Reverting this commit on top of the copy propagation changes would lead to the following shader-db regressions on SKL and other platforms: total instructions in shared programs: 14926738 -> 14935415 (0.06%) instructions in affected programs: 1892445 -> 1901122 (0.46%) helped: 0 HURT: 8676 Without the following copy propagation changes this doesn't have any effect on shader-db on Gen7+, because we would typically set up the FB write header with a separate SIMD16 MOV that isn't currently copy-propagated into the LOAD_PAYLOAD, so the individual SIMD8 MOVs result of LOAD_PAYLOAD lowering would get register-coalesced away under normal circumstances. However that wasn't the case for MRF LOAD_PAYLOAD destinations on Gen6 and earlier, because register coalesce only kicks in for GRFs, leaving a number of redundant SIMD8 MOVs lying around. On SNB this leads to the following shader-db improvements: total instructions in shared programs: 10770538 -> 10734681 (-0.33%) instructions in affected programs: 2700655 -> 2664798 (-1.33%) helped: 17791 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:20:46 -08:00
Eric Anholt	3d9a3d0be0	i965: Reuse the new core glsl_count_dword_slots(). The only difference I could see was treating interfaces like structs. Maintain that case. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297>	2020-01-14 23:55:00 +00:00
Francisco Jerez	c20dc9b836	intel/fs: Make implied_mrf_writes() an fs_inst method. This will be convenient in a later commit enabling SIMD32 fragment shaders, and happens to fix the calculation for MATH instructions which is currently inaccurate for SIMD-lowered instructions on Gen4-5 platforms (all of them on Gen4 in SIMD16 mode), since it was based on the shader's dispatch width rather than on the actual execution size of the instruction. This causes some shader-db noise on Gen4 due to the more compact register allocation interacting with the SEND dependency workarounds, but otherwise no major changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:02:30 -08:00
Francisco Jerez	0a6e46d44d	intel/fs/gen11+: Handle ROR/ROL in lower_simd_width(). Prevents invalid code from being emitted for ROR/ROL instructions in SIMD32 shaders. The problem can be reproduced with the following tests while forcing SIMD32 to be used for fragment shaders: piglit.shaders.glsl-rotate-left piglit.shaders.glsl-rotate-right However the issue could occur in production already with compute shaders and a workgroup size large enough to trigger SIMD32 dispatch. Fixes: `83fdec0f0d` "intel/compiler: Enable the emission of ROR/ROL instructions" Cc: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:00:24 -08:00
Caio Marcelo de Oliveira Filho	2137be22fa	intel/fs: Fix lowering of dword multiplication by 16-bit constant Existing code was ignoring whether the type of the immediate source was signed or not. If the source was signed, it would ignore small negative values but it also would wrongly accept values between INT16_MAX and UINT16_MAX, causing the atual value to later be reinterpreted as a negative number (under 16-bits). Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms that don't support MUL with 32x32 types, including ICL and TGL. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-17 10:45:22 -08:00
Caio Marcelo de Oliveira Filho	c06ba83589	intel/fs: Lower 64-bit MOVs after lower_load_payload() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>	2019-12-14 21:12:21 +00:00
Tapani Pälli	32ebd4207a	intel/compiler: force simd8 when dual src blending on gen8 Patch introduces option to force simd8 and uses it as a workaround for dual source blending issues seen with skqp (skia testsuite) on gen8. Fixes following Piglit test on gen8 platforms: arb_blend_func_extended-dual-src-blending-issue-1917 Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1917 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> c: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-05 09:42:50 +02:00
Jason Ekstrand	2fca325ea6	Revert "i965/fs: Merge CMP and SEL into CSEL on Gen8+" This reverts commit `52c7df1643`. The pass, while clearly useful for some shaders, has at least three bugs that I was able to find fairly quickly: 1. It doesn't work for type-converting MOVs because f > 0 is not the same as f2i(f) > 0 2. CSEL is a 3src instruction and only supports one source type; it doesn't take this into account and tries to create instructions which do a F compare and a D select. This is especially nasty to debug because you don't see that in the dumped assembly because we don't properly assert that types are the same in codegen. 3. While you can handle 2, in theory, by reinterpreting types, you can't do that in the presence of source modifiers. This pass doesn't even attempt to detect that. Those are just the ones I found with the one almost trival shader I was debugging. There very likely may be more and. Best thing to do for now is just shut it off until someone has the time to figure out how to do this properly and write tests to ensure it's correct. Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+" Reviewed-by: Brian Paul <brianp@vmware.com>	2019-11-20 20:47:32 +00:00
Jason Ekstrand	d1c4e64a69	intel/compiler: Add a flag to avoid compacting push constants In vec4, we can just not run the pass. In fs, things are a bit more deeply intertwined. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-11-18 18:35:14 +00:00
Italo Nicola	59623f211b	intel/compiler: remove old comment This comment was correct some time ago, but since commit `d3c10ad427`, it isn't true anymore. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-11-18 10:20:34 -08:00
Jason Ekstrand	53bfcdeecf	intel/fs: Implement the new load/store_scratch intrinsics This commit fills in a number of different pieces: 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the new intrinsics. This involves simple plumbing work as well as a tiny bit of extra logic to always scalarize scratch intrinsics 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics into byte/dword scattered read/write messages which use the A32 stateless model. 3. Add code to lower_surface_logical_send to handle dword scattered messages and the A32 stateless model. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-11-11 17:17:02 +00:00
Jason Ekstrand	1dff48af05	intel/fs: refactor surface header setup Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-11-11 17:17:02 +00:00
Jason Ekstrand	a0999bc049	intel/fs: Add DWord scattered read/write opcodes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-11-11 17:17:02 +00:00
Sagar Ghuge	f729ecefef	intel/compiler: Remove emit_alpha_to_coverage workaround from backend Remove emit_alpha_to_coverage workaround from backend compiler and start using ported workaround from NIR. v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho) Fixes piglit test on HSW: - arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-21 11:27:29 -07:00
Jason Ekstrand	c92fb60007	intel/fs/gen12: Implement gl_FrontFacing on gen12+. The bit moved on gen12 in order to prepare for dual-SIMD8 dispatch. This implementation isn't an entirely complete as it only works on SIMD8 and SIMD16 and not dual-SIMD8. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	cb6db5bfb3	intel/fs/gen12: Don't support source mods for 32x16 integer multiply. Due to hardware bug filed as HSDES#1604601757. v2: Only return if result of fs_inst::can_do_source_mods() is known to be false for the case new orthogonal restrictions are implemented below in the future. (Caio) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-11 12:24:16 -07:00
Francisco Jerez	265c7c8971	intel/fs/gen12: Introduce software scoreboard lowering pass. Gen12+ hardware lacks the register scoreboard logic that used to guarantee data coherency between register reads and writes in previous generations. This lowering pass runs after register allocation in order to make up for it. It works by performing global dataflow analysis in order to determine the set of potential dependencies of every instruction in the shader, and then inserts any required SWSB annotations and additional SYNC instructions in order to guarantee data coherency. v2: Drop unnecessary _safe list iteration (Caio). v3: Temporarily workaround potential WaR hazard between FPU instruction and subsequent out-of-order write, pending clarification from the hardware team. Drop redundant tracking of implicit access of acc0-1, since the hardware guarantees coherency of these (but not the other accumulators...). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-11 12:24:16 -07:00
Francisco Jerez	b2ae65c7d9	intel/fs: Fix constness of implied_mrf_writes() argument. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00

1 2 3 4 5 ...

279 commits