fdo-mirrors/mesa - Hyprland Development

106673 commits 64 branches 914 tags 1 GiB

Author	SHA1	Message	Date
Iago Toral Quiroga	3918943211	intel/compiler: do not copy-propagate strided regions to ddx/ddy arguments The implementation of these opcodes in the generator assumes that their arguments are packed, and it generates register regions based on that assumption. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 08:09:45 +01:00
Ian Romanick	d515c75463	intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Francisco Jerez	11674dad8a	intel/fs: Optimize and simplify the copy propagation dataflow logic. Previously the dataflow propagation algorithm would calculate the ACP live-in and -out sets in a two-pass fixed-point algorithm. The first pass would update the live-out sets of all basic blocks of the program based on their live-in sets, while the second pass would update the live-in sets based on the live-out sets. This is incredibly inefficient in the typical case where the CFG of the program is approximately acyclic, because it can take up to 2*n passes for an ACP entry introduced at the top of the program to reach the bottom (where n is the number of basic blocks in the program), until which point the algorithm won't be able to reach a fixed point. The same effect can be achieved in a single pass by computing the live-in and -out sets in lock-step, because that makes sure that processing of any basic block will pick up the updated live-out sets of the lexically preceding blocks. This gives the dataflow propagation algorithm effectively O(n) run-time instead of O(n^2) in the acyclic case. The time spent in dataflow propagation is reduced by 30x in the GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP test-case on my CHV system (the improvement is likely to be of the same order of magnitude on other platforms). This more than reverses an apparent run-time regression in this test-case from my previous copy-propagation undefined-value handling patch, which was ultimately caused by the additional work introduced in that commit to account for undefined values being multiplied by a huge quadratic factor. According to Chad this test was failing on CHV due to a 30s time-out imposed by the Android CTS (this was the case regardless of my undefined-value handling patch, even though my patch substantially exacerbated the issue). On my CHV system this patch reduces the overall run-time of the test by approximately 12x, getting us to around 13s, well below the time-out. v2: Initialize live-out set to the universal set to avoid rather pessimistic dataflow estimation in shaders with cycles (Addresses performance regression reported by Eero in GpuTest Piano). Performance numbers given above still apply. No shader-db changes with respect to master. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271 Reported-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-01-17 11:56:08 -08:00
Francisco Jerez	9355116bda	intel/fs: Don't let undefined values prevent copy propagation. This makes the dataflow propagation logic of the copy propagation pass more intelligent in cases where the destination of a copy is known to be undefined for some incoming CFG edges, building upon the definedness information provided by the last patch. Helps a few programs, and avoids a handful shader-db regressions from the next patch. shader-db results on ILK: total instructions in shared programs: 6541547 -> 6541523 (-0.00%) instructions in affected programs: 360 -> 336 (-6.67%) helped: 8 HURT: 0 LOST: 0 GAINED: 10 shader-db results on BDW: total instructions in shared programs: 8174323 -> 8173882 (-0.01%) instructions in affected programs: 7730 -> 7289 (-5.71%) helped: 5 HURT: 2 LOST: 0 GAINED: 4 shader-db results on SKL: total instructions in shared programs: 8185669 -> 8184598 (-0.01%) instructions in affected programs: 10364 -> 9293 (-10.33%) helped: 5 HURT: 2 LOST: 0 GAINED: 2 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-07 18:27:04 -08:00
Jose Maria Casanova Crespo	c57a3f200d	i965/fs: Add byte scattered read message and fs support v2: Fix alignment style (Topi Pohjolainen) (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_read. v3: (Jason Ekstrand) - Use renamed brw_byte_scattered_data_element_from_bit_size method - Assert scattered read for Gen8+ and Haswell. - Use conditional expresion at components_read. - Include comment about params for scattered opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo	f1a9936ee1	i965/fs: Add byte scattered write message and fs support v2: (Jason Ekstrand) - Enable bit_size parameter to scattered messages to enable different bitsizes byte/word/dword. - Remove use of brw_send_indirect_scattered_message in favor of brw_send_indirect_surface_message. - Move scattered messages to surface messages namespace. - Assert align1 for scattered messages and assume Gen8+. - Inline brw_set_dp_byte_scattered_write. v3: - Remove leftover newline (Topi Pohjolainen) - Rename brw_data_size to brw_scattered_data_element and use defines instead of an enum (Jason Ekstrand) - Assert scattered write for Gen8+ and Haswell (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-12-06 08:57:18 +01:00
Jason Ekstrand	700bebb958	i965: Move the back-end compiler to src/intel/compiler Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-03-13 11:16:34 +00:00

Renamed from src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp (Browse further)

7 commits