fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Caio Oliveira	a0580dadfd	intel/compiler: Create a struct to hold SIMD selection state This is a preparation to decouple the storage of what SIMDs compiled/spilled from the cs_prog_data. This will allow reuse of SIMD selection code by Bindless Shaders. And since we have a struct now, move the error array there so reduce the boilerplate of the users. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>	2022-11-15 04:55:18 +00:00
Caio Oliveira	8cda6cd774	intel/compiler: Simplify usage of brw_simd_select_for_workgroup_size() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>	2022-11-15 04:55:18 +00:00
Caio Oliveira	ecc2dfc503	intel/compiler: Use std::unique_ptr for tracking the fs_visitors Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19605>	2022-11-10 18:01:52 +00:00
Ian Romanick	9abeb3d739	intel/fs: Optimize integer multiplication of large constants by factoring Many Intel platforms can only perform 32x16 bit multiplication. The straightforward way to implement 32x32 bit multiplications is by splitting one of the operands into high and low parts called H and L, repsectively. The full multiplication can be implemented as: ((A * H) << 16) + (A * L) On Intel platforms, special register accesses can be used to eliminate the shift operation. This results in three instructions and a temporary register for most values. If H or L is 1, then one (or both) of the multiplications will later be eliminated. On some platforms it may be possible to eliminate the multiplication when H is 256. If L is zero (note that H cannot be zero), one of the multiplications will also be eliminated. Instead of splitting the operand into high and low parts, it may possible to factor the operand into two 16-bit factors X and Y. The original multiplication can be replaced with (A * (X * Y)) = ((A * X) * Y). This requires two instructions without a temporary register. I may have gone a bit overboard with optimizing the factorization routine. It was a fun brainteaser, and I couldn't put it down. :) On my 1.3GHz Ice Lake, a standalone test could chug through 1,000,000 randomly selected values in about 5.7 seconds. This is about 9x the performance of the obvious, straightforward implementation that I started with. v2: Drop an unnecessary return. Rearrange logic slightly and rename variables in factor_uint32 to better match the names used in the large comment. Both suggested by Caio. Rearrange logic to avoid possibly using `a` uninitialized. Noticed by Marcin. v3: Use DIV_ROUND_UP instead of open coding it. Noticed by Caio. Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown) total instructions in shared programs: 19912558 -> 19912526 (<.01%) instructions in affected programs: 3432 -> 3400 (-0.93%) helped: 10 / HURT: 0 total cycles in shared programs: 856413218 -> 856412810 (<.01%) cycles in affected programs: 122032 -> 121624 (-0.33%) helped: 9 / HURT: 0 No shader-db changes on any other Intel platforms. Tiger Lake and Ice Lake had similar results. (Ice Lake shown) Instructions in all programs: 141997227 -> 141996923 (-0.0%) Instructions helped: 71 Cycles in all programs: 9162524757 -> 9162523886 (-0.0%) Cycles helped: 63 Cycles hurt: 5 No fossil-db changes on any other Intel platforms. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Ian Romanick	90d267b2d1	intel/fs: Fix bounds checking for integer multiplication lowering The previous bounds checking would cause mul(8) g121<1>D g120<8,8,1>D 0xec4dD to be lowered to mul(8) g121<1>D g120<8,8,1>D 0xec4dUW mul(8) g41<1>D g120<8,8,1>D 0x0000UW add(8) g121.1<2>UW g121.1<16,8,2>UW g41<16,8,2>UW Instead of picking the bounds (and the new type) based on the old type, pick the new type based on the value only. This helps a few fossil-db shaders in Witcher 3 and Geekbench5. No changes on any other Intel platforms. Tiger Lake Instructions in all programs: 157581069 -> 157580768 (-0.0%) Instructions helped: 24 Cycles in all programs: 7566979620 -> 7566977172 (-0.0%) Cycles helped: 22 Cycles hurt: 4 Ice Lake Instructions in all programs: 141998965 -> 141998667 (-0.0%) Instructions helped: 26 Cycles in all programs: 9162568666 -> 9162565297 (-0.0%) Cycles helped: 24 Cycles hurt: 2 Skylake No changes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Lionel Landwerlin	e59c4a912b	intel/fs: use fs implementation of dump_instructions This specialized version prints out the liveness count as well as the maximum liveness count. It was eye opening when seeing the max liveness jump after lowering of packing instructions which should not have changed the count. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>	2022-10-27 21:05:00 +00:00
Lionel Landwerlin	e5dfff0946	intel/fs: reduce liveness of variables in lowering passes When lowering a single instruction with a destination VGRF to 2 or more, the VGRF is now considered partially written by each generated instruction and that increases its liveness especially in loops. Thus potentially increasing the number of spills/fills due to register allocation. Putting an UNDEF instruction in front of the lowered instructions allows the IR to limit the liveness of the VGRF, reducing register pressure. This has a pretty dramatic effect on spills/fills for RT shaders. Here the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to register allocation) : Instructions in all programs: 26150 -> 24955 (-4.6%) SENDs in all programs: 1148 -> 1148 (+0.0%) Loops in all programs: 4 -> 4 (+0.0%) Cycles in all programs: 392179 -> 332787 (-15.1%) Spills in all programs: 132 -> 116 (-12.1%) Fills in all programs: 262 -> 154 (-41.2%) Shader-db results on TGL : total instructions in shared programs: 21158140 -> 21158377 (<.01%) instructions in affected programs: 76629 -> 76866 (0.31%) helped: 18 HURT: 20 helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12 helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77% HURT stats (abs) min: 1 max: 79 x̄: 28.85 x̃: 18 HURT stats (rel) min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79% 95% mean confidence interval for instructions value: -4.82 17.30 95% mean confidence interval for instructions %-change: -0.34% 0.57% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 5753 -> 5753 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 798856834 -> 798870688 (<.01%) cycles in affected programs: 6208395 -> 6222249 (0.22%) helped: 22 HURT: 17 helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782 helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44% HURT stats (abs) min: 2 max: 19178 x̄: 2676.12 x̃: 1358 HURT stats (rel) min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71% 95% mean confidence interval for cycles value: -952.19 1662.65 95% mean confidence interval for cycles %-change: -0.64% 1.90% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4078 -> 4066 (-0.29%) spills in affected programs: 40 -> 28 (-30.00%) helped: 2 HURT: 0 total fills in shared programs: 2856 -> 2832 (-0.84%) fills in affected programs: 127 -> 103 (-18.90%) helped: 2 HURT: 0 total sends in shared programs: 998554 -> 998554 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>	2022-10-27 21:05:00 +00:00
Lionel Landwerlin	dd6d40429b	intel/fs: make split_virtual_grfs deal with partial undefs v2: fix up UNDEFs instructions (Curro) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>	2022-10-27 21:05:00 +00:00
Tapani Pälli	c9c9a5b78d	intel/fs: mark debug variables with ASSERTED To clean up compilation warnings about unused variables when asserts are disabled. v2: UNUSED -> ASSERTED (Eric Engestrom) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19016>	2022-10-11 04:14:30 +00:00
Lionel Landwerlin	23c7142cd6	anv: disable SIMD16 for RT shaders Since divergence is a lot more likely in RT than compute, it makes sense to limit ourselves to SIMD8. The trampoline shader defaults to SIMD16 since this one is uniform. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:37 +00:00
Lionel Landwerlin	9dba8d8aa1	intel/fs: take a builder arg for resolve_source_modifiers() There will be situations where we will want to use a local builder rather than the one associated with NIR->backend translation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>	2022-09-28 05:38:36 +00:00
Kenneth Graunke	be21d54aca	intel/compiler: Use an existing URB write to end TCS threads when viable VS, TCS, TES, and GS threads must end with a URB write message with the EOT (end of thread) bit set. For VS and TES, we shadow output variables with temporaries and perform all stores at the end of the shader, giving us an existing message to do the EOT. In tessellation control shaders, we don't defer output stores until the end of the thread like we do for vertex or evaluation shaders. We just process store_output and store_per_vertex_output intrinsics where they occur, which may be in control flow. So we can't guarantee that there's a URB write being at the end of the shader. Traditionally, we've just emitted a separate URB write to finish TCS threads, doing a writemasked write to an single patch header DWord. On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is a convenient spot to do so. But on other platforms, there's no such field, and this write is purely wasteful. Insetad of emitting a separate write, we can just look for an existing URB write at the end of the program and tag that with EOT, if possible. We already had code to do this for geometry shaders, so just lift it into a helper function and reuse it. No changes in shader-db. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>	2022-09-27 18:17:42 -07:00
Lionel Landwerlin	139e8f4635	intel/fs: fixup a64 messages And run algebraic when either int64 for float64 are not supported so those don't end up in the generated code. Cc: mesa-stable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>	2022-09-23 08:29:17 +00:00
Caio Oliveira	0b6e613de8	intel/compiler: Create and use struct for CS thread payload Move subgroup_id, that's only used by CS for verx10 < 125, as part of the payload too -- even though is not, strictly speaking. Note the thread execution of Task/Mesh is similar enough, so we make their common struct inherit from cs_thread_payload. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	d8461e975a	intel/compiler: Export brw_get_subgroup_id_param_index() Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	9de790760e	intel/compiler: Create and use struct for Bindless thread payload Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	5b6987daee	intel/compiler: Create and use struct for GS thread payload Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	7664c85b1d	intel/compiler: Create and use struct for TASK and MESH thread payloads Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	0ca65b3c4c	intel/compiler: Create and use struct for VS thread payload Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	19c6e1b447	intel/compiler: Create and use struct for TES thread payload Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	9a9b1119b4	intel/compiler: Store Patch URB output in TCS thread payload struct Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	e21359ed0e	intel/compiler: Create struct for TCS thread payload Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Caio Oliveira	73920b7e2f	intel/compiler: Use FS thread payload only for FS Move the setup into the FS thread payload constructor. Consolidate payload setup for that in brw_fs_thread_payload.cpp file. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>	2022-09-13 01:44:24 +00:00
Marcin Ślusarz	2e1b96bb1b	intel/compiler: implement EXT_mesh_shader Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18371>	2022-09-02 17:40:47 +00:00
Kenneth Graunke	d689ef7482	intel/compiler: Change dg2_plus check to devinfo->verx10 >= 125 Less special casing and possibly more future-proof. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17990>	2022-08-29 10:28:32 +00:00
Lionel Landwerlin	f242c9af76	intel/fs: bump max SIMD size for A64 atomics with LSC Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>. Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>	2022-08-24 17:51:40 +00:00
Lionel Landwerlin	a81ca32f96	intel/fs: remove unused opcode Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>	2022-08-24 17:51:40 +00:00
Lionel Landwerlin	aa65f83203	intel/fs: switch compute push constant loads to LSC We're now able to load up to 8 GRFs in one send. v2: Switch to use transpose + vector of up to 64 (Thanks Curro!) v3: Increase parallelism by not reusing the same register for push constant offset (Curro) v4: Drop dead ADD() instruction (Curro) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>	2022-08-24 17:51:40 +00:00
Caio Oliveira	a1b1fdf70d	intel/compiler: Rename 8_PATCH to MULTI_PATCH Make it clearer we are dealing with multiple patches, works better in constrast with SINGLE_PATCH. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18151>	2022-08-24 00:39:57 +00:00
Kenneth Graunke	2cea0d6ef6	intel/compiler: Drop variable group size lowering This backend lowering code has been dead since the removal of i965 - nothing in the current source tree ever sets the flag. This is handled by iris_setup_uniforms() and crocus_setup_uniforms(). Variable group size does not appear to be a feature in anv. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18055>	2022-08-18 16:17:03 +00:00
Lionel Landwerlin	734384e8bc	intel/fs: fixup simd selection with shader calls Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17908>	2022-08-05 11:51:31 +00:00
Lionel Landwerlin	9cb9390962	intel/fs: store num of resume shaders in prog_data That way we can look at the SBT entries for debug purposes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17908>	2022-08-05 11:51:31 +00:00
Marcin Ślusarz	883acc4150	intel/compiler: use NIR_PASS more Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17619>	2022-08-02 10:07:05 +00:00
Marcin Ślusarz	7ebae85955	intel/compiler: insert URB fence before task/mesh termination Bspec 53421 says: "A URB fence memory is typically performed prior the thread exit message, so that the next thread dispatch that reads that URB memory will see it." Cc: 22.1 <mesa-stable> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16665>	2022-08-02 09:31:24 +00:00
Kenneth Graunke	49ee3ae9e8	intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+ This allows the software scoreboarding pass, scheduler, and so on to handle the individual instructions and handle them, rather than trusting in the generator to do scoreboarding correctly when expanding the virtual instruction to multiple actual instructions. By using SHADER_OPCODE_READ_SR_REG, we also correctly handle the software scoreboarding workaround when reading DMask/VMask. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17530>	2022-08-02 08:41:43 +00:00
Ian Romanick	377246318a	intel/fs: Eliminate "masked" and "per slot offset" URB messages All of this information can be inferred from the sources. v2: Fix "error: unused variable 'opcode'" detected by marge-bot. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>	2022-07-26 17:25:19 +00:00
Ian Romanick	1b17f8fc5a	intel/fs: Make logical URB read instructions more like other logical instructions No shader-db changes on any Intel platform Fossil-db results: Tiger Lake Instructions in all programs: 156926440 -> 156926470 (+0.0%) Instructions hurt: 15 Cycles in all programs: 7513099349 -> 7513099402 (+0.0%) Cycles hurt: 15 Ice Lake and Skylake had similar results. (Ice Lake shown) Cycles in all programs: 9099036492 -> 9099036489 (-0.0%) Cycles helped: 1 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>	2022-07-26 17:25:19 +00:00
Ian Romanick	349a040f68	intel/fs: Make logical URB write instructions more like other logical instructions The changes to fs_visitor::validate() helped track down a place where I initially forgot to convert a message to the new sources layout. This had caused a different validation failure in dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing, but this were not detected until after SENDs were lowered. Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 19951145 -> 19951133 (<.01%) instructions in affected programs: 2429 -> 2417 (-0.49%) helped: 8 / HURT: 0 total cycles in shared programs: 858904152 -> 858862331 (<.01%) cycles in affected programs: 5702652 -> 5660831 (-0.73%) helped: 2138 / HURT: 1255 Broadwell total cycles in shared programs: 904869459 -> 904835501 (<.01%) cycles in affected programs: 7686744 -> `7652786` (-0.44%) helped: 2861 / HURT: 2050 Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 141442369 -> 141442032 (-0.0%) Instructions helped: 337 Cycles in all programs: 9099270231 -> 9099036492 (-0.0%) Cycles helped: 40661 Cycles hurt: 28606 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>	2022-07-26 17:25:18 +00:00
Emma Anholt	94bd06256a	intel/fs: Simplify brw_barycentric_mode() args. Reduce a bit of mode lookup noise I was tracing through trying to resolve the previous bug. Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>	2022-07-19 01:25:47 +00:00
Lionel Landwerlin	2d1f021e16	intel/fs: Set NonPerspectiveBarycentricEnable when the interpolator needs it. [anholt: changed to make all drivers do the right thing by moving the payload barycentric check into the compiler] Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>	2022-07-19 01:25:47 +00:00
Jason Ekstrand	3346f6918f	intel/fs,anv: Rework handling of coarse and sample shading Now that this information is accurately gathered by spirv_to_nir, we no longer need the hack. We just need to fix up the way we handle some of the key bits. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>	2022-07-13 20:28:42 +00:00
Jason Ekstrand	d0b154319d	intel/fs: Simplify persample_dispatch Thanks to the previous commit, we no longer need this check. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>	2022-07-13 20:28:42 +00:00
Jason Ekstrand	fd17aaf430	intel/fs: Use nir_lower_single_sampled This lets us drop demote_sample_qualifiers as well as a back-end check for key->multisample_fbo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>	2022-07-13 20:28:42 +00:00
Jason Ekstrand	ca9f0f72db	intel/fs: Use shader_info::fs::uses_sample_shading NIR constructs this information for us as part of nir_gather_info these days so we can simplify our logic a bit. This will also let us be more correct once we move uses_sample_shading scraping earlier. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>	2022-07-13 20:28:42 +00:00
Jason Ekstrand	530de844ef	intel,anv,iris,crocus: Drop subgroup size from the shader key Use nir->info.subgroup_size instead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17337>	2022-07-08 22:47:22 +00:00
Ian Romanick	bbcb881f46	intel/fs: Remove non-_LOGICAL URB messages The _LOGICAL versions are lowered direct to SEND, so nothing can ever generate these messages. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Ian Romanick	a477587b4a	intel/fs: Add _LOGICAL versions of URB messages The lowering is currently fake. It just changes the opcode from the _LOGICAL version to the non-_LOGICAL version. v2: Remove some rebase cruft. 's/gfx8_//;s/simd8_/' in brw_instruction_name. Both suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Ian Romanick	07b9bfacc7	intel/compiler: Move logical-send lowering to a separate file brw_fs.cpp was 10kloc. Now it's only 7.5kloc. Ugh. v2: Rebase on `9680e0e4a2`. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Lionel Landwerlin	9680e0e4a2	intel/fs: ray query fix for global address With stages dispatching with a mask, we can run into situations where we don't have the global address in all lanes. The existing code always assumed we had the addres in at least lane0. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `bb40e999d1` ("intel/nir: use a single intel intrinsic to deal with ray traversal") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>	2022-07-08 00:36:04 +00:00
Kenneth Graunke	589b03d02f	intel/fs: Opportunistically split SEND message payloads While we've taken advantage of split-sends in select situations, there are many other cases (such as sampler messages, framebuffer writes, and URB writes) that have never received that treatment, and continued to use monolithic send payloads. This commit introduces a new optimization pass which detects SEND messages with a single payload, finds an adjacent LOAD_PAYLOAD that produces that payload, splits it two, and updates the SEND to use both of the new smaller payloads. In places where we manually used split SENDS, we rely on underlying knowledge of the message to determine a natural split point. For example, header and data, or address and value. In this pass, we instead infer a natural split point by looking at the source registers. Often times, consecutive LOAD_PAYLOAD sources may already be grouped together in a contiguous block, such as a texture coordinate. Then, there is another bit of data, such as a LOD, that may come from elsewhere. We look for the point where the source list switches VGRFs, and split it there. (If there is a message header, we choose to split there, as it will naturally come from elsewhere.) This not only reduces the payload sizes, alleviating register pressure, but it means that we may be able to eliminate some payload construction altogether, if we have a contiguous block already and some extra data being tacked on to one side or the other. shader-db results for Icelake are: total instructions in shared programs: 19602513 -> 19369255 (-1.19%) instructions in affected programs: 6085404 -> 5852146 (-3.83%) helped: 23650 / HURT: 15 helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3 helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15% HURT stats (abs) min: 1 max: 44 x̄: 7.20 x̃: 2 HURT stats (rel) min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00% 95% mean confidence interval for instructions value: -10.16 -9.55 95% mean confidence interval for instructions %-change: -3.84% -3.72% Instructions are helped. total cycles in shared programs: 848180368 -> 842208063 (-0.70%) cycles in affected programs: 599931746 -> 593959441 (-1.00%) helped: 22114 / HURT: 13053 helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22 helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75% HURT stats (abs) min: 1 max: 94022 x̄: 526.67 x̃: 22 HURT stats (rel) min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61% 95% mean confidence interval for cycles value: -222.87 -116.79 95% mean confidence interval for cycles %-change: -1.44% -1.20% Cycles are helped. total spills in shared programs: 8387 -> 6569 (-21.68%) spills in affected programs: 5110 -> 3292 (-35.58%) helped: 359 / HURT: 3 total fills in shared programs: 11833 -> 8218 (-30.55%) fills in affected programs: 8635 -> 5020 (-41.86%) helped: 358 / HURT: 3 LOST: 1 SIMD16 shader, 659 SIMD32 shaders GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%) Examining these results: the few shaders where spills/fills increased were already spilling significantly, and were only slightly hurt. The applications affected were also helped in countless other shaders, and other shaders stopped spilling altogether or had 50% reductions. Many SIMD16 shaders were gained, and overall we gain more SIMD32, though many close to the register pressure line go back and forth. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>	2022-07-01 02:05:45 +00:00

... 4 5 6 7 8 ...

789 commits