fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Caio Oliveira	4f246cf4e7	intel/compiler: Merge child/latency arrays in schedule_node Values are used together, saves one pointer in schedule_node, reduces amount of reallocations when children count grows. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	e59a054203	intel/compiler: Move FS specific fields to fs_instruction_scheduler Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	a6297d05ca	intel/compiler: Remove virtual calls from scheduler Pull run() and schedule_instructions() for fs, and pull a very simplified version of those into a run() for vec4. Because of the previous patches the duplication is small. Since we are touching these, change run() implementations to use the cfg from the existing reference to the visitor/shader instead of taking one as argument. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	d76d58cf50	intel/compiler: Cache issue_time information Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	ecd7ffcf78	intel/compiler: Extract scheduling related basic functions Those will be used in multiple places later. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	8a8dd2db0c	intel/compiler: Add only available instructions to scheduling list The list was used for iterating through all instructions and then later also to track the available ones. Now that the array iteration is used, change how we fill it and rename it to reflect its only job. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	ddff6428c5	intel/compiler: Use array to iterate the scheduler nodes For all the preparation data collection before the scheduling actually happens, it is possible to walk the schedule nodes in order by iterating on the range of the array dedicated to a given block. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	fe6ac5a184	intel/compiler: Allocate all schedule_nodes at once Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	be012055da	intel/compiler: Remove reference to brw_isa_info from schedule_node It is always the same for all nodes, so use the one available in the scheduler itself. Also, per Matt's suggestion, collect is_haswell from devinfo instead of from a function argument. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Caio Oliveira	6987571737	intel/compiler: Use linear allocator in parts of brw_schedule_instructions Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>	2023-11-13 23:05:47 +00:00
Francisco Jerez	80e9031b44	intel/fs/xe2+: Fix grf_count in post-RA scheduling for updated register file size. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>	2023-09-20 17:19:36 -07:00
Kenneth Graunke	7eba19245d	intel/compiler: Move SCHEDULE_NONE handling into schedule_instructions() I'm going to introduce another call site for this function, and just handling SCHEDULE_NONE in the scheduler itself makes more sense than duplicating the logic. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Emma Anholt	10b94772d2	intel: Reduce cost of resetting last_grf_write. In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of last_grf_write 2400 times, multiple times through the scheduler. Just resetting for the processed instructions reduces runtime from 21s to 16s. No change on steam shader-db runtime across several runs. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>	2023-06-14 16:16:56 +00:00
Emma Anholt	7d4769e802	intel: Allocate the last_grf_write once per scheduler. No need to re-calloc it per block when we're going to use it again. Also, this fixes the vec4 backend to avoid allocating giant grf_count-sized arrays on the stack. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>	2023-06-14 16:16:56 +00:00
Emma Anholt	2ad865b219	intel: Count reads_remaining across all blocks. We were zeroing it out per block, but it doesn't actually help to count per block, since the question is "will scheduling this instruction free the reg?". Saves some memsetting, which was showing up high in the profile (but not from this source). No change on iris SKL shader-db. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>	2023-06-14 16:16:55 +00:00
Lionel Landwerlin	a66944dfbc	intel/fs: reuse descriptor helper Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>	2023-05-30 06:36:36 +00:00
Lionel Landwerlin	9471ffa70a	intel/fs: fix scheduling of HALT instructions With the following test : dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_out_of_bounds_load There is a : shader_start: ... <- no control flow g0 = some_alu g1 = fbl g2 = broadcast g3, g1 g4 = get_buffer_size g2 ... <- no control flow halt <- on some lanes g5 = send <surface>, g4 eliminate_find_live_channel will remove the fbl/broadcast because it assumes lane0 is active at get_buffer_size : shader_start: ... <- no control flow g0 = some_alu g4 = get_buffer_size g0 ... <- no control flow halt <- on some lanes g5 = send <surface>, g4 But then the instruction scheduler will move the get_buffer_size after the halt : shader_start: ... <- no control flow halt <- on some lanes g0 = some_alu g4 = get_buffer_size g0 g5 = send <surface>, g4 get_buffer_size pulls the surface index from lane0 in g0 which could have been turned off by the halt and we end up accessing an invalid surface handle. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20765>	2023-05-05 00:43:25 +03:00
Jason Ekstrand	714a291673	intel/compiler: Use SHADER_OPCODE_SEND for PI messages Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Lionel Landwerlin	13cca48920	intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more generic sends and drop this internal opcode. The idea behind this change is to allow bindless surfaces to be used for UBO pulls and why it's interesting to be able to reuse setup_surface_descriptors(). But that will come in a later change. No shader-db changes on TGL & DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>	2023-01-26 11:26:53 +00:00
Ian Romanick	bdc7668008	intel/fs: Lower URB messages to SEND Before rebasing on top of Ken's split-SEND optimization (see !17018), this commit just caused some scheduling changes in various tessellation and geometry shaders. These changes were caused by the addition of real latency information for the URB messages. With the addition of the split-SEND optimization, the changes are... staggering. All of the shaders helped for spills and fills are vertex shaders from Batman Arkham Origins. What surprises me is that these shaders account for such a high percentage of the spills and fills in fossil-db. 85%?!? v2: Use FIXED_GRF instead of BRW_GENERAL_REGISTER_FILE in an assertion. Suggested by Ken. Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20013625 -> 19954020 (-0.30%) instructions in affected programs: 4007157 -> 3947552 (-1.49%) helped: 31161 HURT: 0 helped stats (abs) min: 1 max: 400 x̄: 1.91 x̃: 2 helped stats (rel) min: 0.08% max: 59.70% x̄: 2.20% x̃: 1.83% 95% mean confidence interval for instructions value: -1.97 -1.86 95% mean confidence interval for instructions %-change: -2.22% -2.18% Instructions are helped. total cycles in shared programs: 859337569 -> 858636788 (-0.08%) cycles in affected programs: 74168298 -> 73467517 (-0.94%) helped: 13812 HURT: 16846 helped stats (abs) min: 1 max: 291078 x̄: 82.83 x̃: 4 helped stats (rel) min: <.01% max: 37.09% x̄: 3.47% x̃: 2.02% HURT stats (abs) min: 1 max: 1543 x̄: 26.31 x̃: 14 HURT stats (rel) min: <.01% max: 77.97% x̄: 4.11% x̃: 2.58% 95% mean confidence interval for cycles value: -55.10 9.39 95% mean confidence interval for cycles %-change: 0.62% 0.77% Inconclusive result (value mean confidence interval includes 0). Broadwell total cycles in shared programs: 904844939 -> 904832320 (<.01%) cycles in affected programs: 525360 -> 512741 (-2.40%) helped: 215 HURT: 4 helped stats (abs) min: 4 max: 1018 x̄: 60.16 x̃: 39 helped stats (rel) min: 0.14% max: 15.85% x̄: 2.16% x̃: 2.04% HURT stats (abs) min: 79 max: 79 x̄: 79.00 x̃: 79 HURT stats (rel) min: 1.31% max: 1.57% x̄: 1.43% x̃: 1.43% 95% mean confidence interval for cycles value: -75.02 -40.22 95% mean confidence interval for cycles %-change: -2.37% -1.81% Cycles are helped. No shader-db changes on any older Intel platforms. Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 142622800 -> 141461114 (-0.8%) Instructions helped: 197186 Cycles in all programs: 9101223846 -> 9099440025 (-0.0%) Cycles helped: 37963 Cycles hurt: 151233 Spills in all programs: 98829 -> 13695 (-86.1%) Spills helped: 2159 Fills in all programs: 128142 -> 18400 (-85.6%) Fills helped: 2159 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>	2022-07-08 19:45:34 +00:00
Kenneth Graunke	72e9843991	intel/compiler: Introduce a new brw_isa_info structure This structure will contain the opcode mapping tables in the next commit. For now, this is the mechanical change to plumb it into all the necessary places, and it continues simply holding devinfo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Lionel Landwerlin	361b3fee3c	intel: move away from booleans to identify platforms v2: Drop changes around GFX_VERx10 == 75 (Luis) v3: Replace (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT) by (devinfo->platform == INTEL_PLATFORM_IVB) Replace (devinfo->ver >= 5 \|\| devinfo->platform == INTEL_PLATFORM_G4X) by (devinfo->verx10 >= 45) Replace (devinfo->platform != INTEL_PLATFORM_G4X) by (devinfo->verx10 != 45) v4: Fix crocus typo v5: Rebase v6: Add GFX3, ILK & I965 platforms (Jordan) Move ifdef to code expressions (Jordan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>	2021-11-08 16:48:06 +00:00
Dave Airlie	8a81d14271	intel/vec4: sel.cond writes the flags on Gfx4 and Gfx5 This is the equivalent of idr's intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 except for the vec4 backend. This fixes buggy rendering seen with crocus on a qt trace. v2 (idr): Trivial whitespace change. Add unit tests. v3: Fix type in comment in unit tests. Noticed by Jason and Priit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Iron Lake total instructions in shared programs: 8183077 -> 8184543 (0.02%) instructions in affected programs: 198990 -> 200456 (0.74%) helped: 0 HURT: 1355 HURT stats (abs) min: 1 max: 8 x̄: 1.08 x̃: 1 HURT stats (rel) min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.96% 1.03% Instructions are HURT. total cycles in shared programs: 238967672 -> 238962784 (<.01%) cycles in affected programs: 4666014 -> 4661126 (-0.10%) helped: 406 HURT: 314 helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18 helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65% HURT stats (abs) min: 2 max: 112 x̄: 13.48 x̃: 12 HURT stats (rel) min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16% 95% mean confidence interval for cycles value: -8.60 -4.98 95% mean confidence interval for cycles %-change: -0.87% -0.49% Cycles are helped. GM45 total instructions in shared programs: 4986888 -> 4988354 (0.03%) instructions in affected programs: 198990 -> 200456 (0.74%) helped: 0 HURT: 1355 HURT stats (abs) min: 1 max: 8 x̄: 1.08 x̃: 1 HURT stats (rel) min: 0.29% max: 6.00% x̄: 0.99% x̃: 0.70% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.96% 1.03% Instructions are HURT. total cycles in shared programs: 153577826 -> 153572938 (<.01%) cycles in affected programs: 4666014 -> 4661126 (-0.10%) helped: 406 HURT: 314 helped stats (abs) min: 4 max: 54 x̄: 22.46 x̃: 18 helped stats (rel) min: <.01% max: 12.80% x̄: 1.82% x̃: 0.65% HURT stats (abs) min: 2 max: 112 x̄: 13.48 x̃: 12 HURT stats (rel) min: <.01% max: 7.82% x̄: 0.81% x̃: 0.16% 95% mean confidence interval for cycles value: -8.60 -4.98 95% mean confidence interval for cycles %-change: -0.87% -0.49% Cycles are helped. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>	2021-08-11 13:09:32 -07:00
Ian Romanick	38807ceeae	intel/fs: sel.cond writes the flags on Gfx4 and Gfx5 On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented using a separte cmpn and sel instruction. This lowering occurs in fs_vistor::lower_minmax which is called very, very late... a long, long time after the first calls to opt_cmod_propagation. As a result, conditional modifiers can be incorrectly propagated across sel.cond on those platforms. No tests were affected by this change, and I find that quite shocking. After just changing flags_written(), all of the atan tests started failing on ILK. That required the change in cmod_propagatin (and the addition of the prop_across_into_sel_gfx5 unit test). Shader-db results for ILK and GM45 are below. I looked at a couple before and after shaders... and every case that I looked at had experienced incorrect cmod propagation. This affected a LOT of apps! Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2, Gang Beasts, and on and on... :( I discovered this bug while working on a couple new optimization passes. One of the passes attempts to remove condition modifiers that are never used. The pass made no progress except on ILK and GM45. After investigating a couple of the affected shaders, I noticed that the code in those shaders looked wrong... investigation led to this cause. v2: Trivial changes in the unit tests. v3: Fix type in comment in unit tests. Noticed by Jason and Priit. v4: Tweak handling of BRW_OPCODE_SEL special case. Suggested by Jason. Fixes: `df1aec763e` ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Dave Airlie <airlied@redhat.com> Iron Lake total instructions in shared programs: 8180493 -> 8181781 (0.02%) instructions in affected programs: 541796 -> 543084 (0.24%) helped: 28 HURT: 1158 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50% HURT stats (abs) min: 1 max: 3 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23% 95% mean confidence interval for instructions value: 1.06 1.11 95% mean confidence interval for instructions %-change: 0.31% 0.38% Instructions are HURT. total cycles in shared programs: 239420470 -> 239421690 (<.01%) cycles in affected programs: 2925992 -> 2927212 (0.04%) helped: 49 HURT: 157 helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70 helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96% HURT stats (abs) min: 2 max: 48 x̄: 27.34 x̃: 24 HURT stats (rel) min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20% 95% mean confidence interval for cycles value: -0.80 12.64 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4985517 -> 4986207 (0.01%) instructions in affected programs: 306935 -> 307625 (0.22%) helped: 14 HURT: 625 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49% HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1 HURT stats (rel) min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22% 95% mean confidence interval for instructions value: 1.04 1.12 95% mean confidence interval for instructions %-change: 0.29% 0.36% Instructions are HURT. total cycles in shared programs: 153827268 -> 153828052 (<.01%) cycles in affected programs: 1669290 -> 1670074 (0.05%) helped: 24 HURT: 84 helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67 helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94% HURT stats (abs) min: 2 max: 48 x̄: 27.71 x̃: 24 HURT stats (rel) min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14% 95% mean confidence interval for cycles value: -1.94 16.46 95% mean confidence interval for cycles %-change: -0.29% 0.11% Inconclusive result (value mean confidence interval includes 0). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>	2021-08-11 13:09:20 -07:00
Lionel Landwerlin	91dcbf1f56	intel/compiler: Track latency/perf of LSC fences Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11759>	2021-07-12 11:39:03 +00:00
Sagar Ghuge	621cf9b1df	intel/fs: Lower Byte scattered r/w messages to LSC when available v2 (Jason Ekstrand): - Squash in brw_scheduler changes - Update brw_ir_performance Co-authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>	2021-06-30 16:17:18 +00:00
Sagar Ghuge	8f82c8aa1a	intel/fs: Lower untyped float atomic messages to LSC when available Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>	2021-06-30 16:17:18 +00:00
Mark Janes	bd40a1e8c9	intel/fs: Lower untyped atomic messages to LSC when available Bspec programming note metions that "Atomic messages are always forced to "un-cacheable" in the L1 cache". We can make the L1 cache un-cacheable and L3 with write-back policy. v2: (Sagar Ghuge): - Fix caching policy for atomic messages - Fix simd exec size v3: (Sagar Ghuge): - Add atomic messages to brw_schedule_instructions v4: (Jason Ekstrand): - Rebase on lsc_msg_desc reworks Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com> Co-authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>	2021-06-30 16:17:18 +00:00
Mark Janes	4f86a70599	intel/fs: Lower DW untyped r/w messages to LSC when available This puts the basic infrastructure in place for lowering logical dataport messages to LSC messages. We start with the two most obvious opcodes and add more in later patches. v2 (Sagar Ghuge): - Pass required params to message desc - Remove duplicate mlen calculation - Change commit message. v3 (Jason Ekstrand): - Drop TGM support Co-authored-by: Jason Ekstrand <mark.a.janes@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>	2021-06-30 16:17:18 +00:00
Mark Janes	32ec0662fd	intel/compiler: Add LSC messages to brw_schedule_instructions v2 (Jason Ekstrand): - Use lsc_msg_desc_opcode() - Drop all opcodes for now and add them in later patches. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>	2021-06-30 16:17:18 +00:00
Lionel Landwerlin	d665c2dcf0	intel/compiler: use existing helpers to pull bits of descriptors v2: Use new RT descriptor helper Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>	2021-05-02 20:20:06 +00:00
Jason Ekstrand	134af5ada2	intel/compiler: Don't insert barriers for NULL sources Normally, we never see NULL in a source. However, starting with `eab1c55590`, we can with a SHADER_OPCODE_SEND if it only has the first payload. We were inserting barriers which adds unnecessary scheduling dependencies and takes a lot of compile time because inserting a single barrier is an O(n) operation. All the extra O(n) can have a surprisingly large effect. This cuts the runtime of dEQP-VK.binding_model.buffer_device_address.set3.depth3. basessbo.convertcheckuv2.store.single.std140.frag by a factor of 20x for a debug build. Shader-db results on ICL: total instructions in shared programs: 19918983 -> 19921610 (0.01%) instructions in affected programs: 884074 -> 886701 (0.30%) helped: 1688 HURT: 817 helped stats (abs) min: 1 max: 163 x̄: 4.23 x̃: 1 helped stats (rel) min: 0.02% max: 12.50% x̄: 1.08% x̃: 0.61% HURT stats (abs) min: 1 max: 2674 x̄: 11.95 x̃: 2 HURT stats (rel) min: 0.11% max: 70.22% x̄: 1.71% x̃: 1.03% 95% mean confidence interval for instructions value: -1.97 4.06 95% mean confidence interval for instructions %-change: -0.28% -0.06% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 976503324 -> 975884809 (-0.06%) cycles in affected programs: 82581703 -> 81963188 (-0.75%) helped: 4144 HURT: 5010 helped stats (abs) min: 1 max: 79294 x̄: 311.31 x̃: 8 helped stats (rel) min: <.01% max: 53.69% x̄: 2.00% x̃: 0.51% HURT stats (abs) min: 1 max: 92266 x̄: 134.04 x̃: 8 HURT stats (rel) min: <.01% max: 218.09% x̄: 3.25% x̃: 0.53% 95% mean confidence interval for cycles value: -119.85 -15.29 95% mean confidence interval for cycles %-change: 0.68% 1.07% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). total spills in shared programs: 10659 -> 12014 (12.71%) spills in affected programs: 441 -> 1796 (307.26%) helped: 7 HURT: 12 total fills in shared programs: 11551 -> 14429 (24.92%) fills in affected programs: 993 -> 3871 (289.83%) helped: 8 HURT: 11 total sends in shared programs: 1025832 -> 1025353 (-0.05%) sends in affected programs: 2241 -> 1762 (-21.37%) helped: 105 HURT: 1 helped stats (abs) min: 1 max: 87 x̄: 4.57 x̃: 2 helped stats (rel) min: 5.56% max: 54.72% x̄: 11.37% x̃: 10.00% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for sends value: -7.39 -1.65 95% mean confidence interval for sends %-change: -12.95% -7.70% Sends are helped. LOST: 93 GAINED: 109 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4648 Fixes: `eab1c55590` "intel/fs: Support SENDS in SHADER_OPCODE_SEND" Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10412>	2021-04-22 18:00:16 +00:00
Anuj Phogat	61e8636557	intel: Rename gen_device prefix to intel_device export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen_device" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen_device/intel_device/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Anuj Phogat	e7e55af4d6	intel: Rename GENx keyword to GFXx Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "GEN[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/GEN$[[:digit:]]\+$/GFX\1/g" Exclude the changes to modifiers: grep -E "I915_.GFX" -rIl $SEARCH_PATH \| xargs sed -ie "s/$I915_.$GFX/\1GEN/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	1d296484b4	intel: Rename Genx keyword to Gfxx Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/Gen$[[:digit:]]\+$/Gfx\1/g" Exclude changes in src/intel/perf/oa-.xml: find src/intel/perf -type f $ -name ".xml" $ \| xargs sed -ie "s/Gfx/Gen/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	b75f095bc7	intel: Rename genx keyword to gfxx in source files Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen$[[:digit:]]\+$/gfx\1/g" Exclude pack.h and xml changes in this patch: grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH \| xargs sed -ie "s/gfx$[[:digit:]]\+_pack\.h$/gen\1/g" grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH \| xargs sed -ie "s/gfx$[[:digit:]]\+\.xml$/gen\1/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	c1f3a778de	intel: Rename GENx prefix in macros to GFXx in source files Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "GEN" -rIl src/intel/genxml \| grep -E ".py" \| xargs sed -ie "s/GEN$[%{]$/GFX\1/g" grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH \| grep -E ".(\.c\|\.h\|\.y\|\.l)" \| xargs sed -ie "s/$[^_]$GEN$[[:digit:]]\+$/\1GFX\2/g" Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+": grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH \| xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	abe9a71a09	intel: Rename gen field in gen_device_info struct to ver Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "info\)(.\|->)gen" -rIl $SEARCH_PATH \| xargs sed -ie "s/info$)$$\.\\|->$gen/info\1\2ver/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Jason Ekstrand	91192696e6	intel/fs: Add support for 16-bit A64 float and integer atomics The messages for those 16-bit operations still use 32-bit sources and destinations, so expand them accordingly when building the payload. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8750>	2021-03-18 00:13:40 +00:00
Lionel Landwerlin	8b6d22109f	intel/fs/vec4: add missing dependency in write-on-write fixed GRFs If we load constant data using pull constant SENDS, and we later load that register with some other data, we can end up in a situation where we don't track the initial fixed register write and therefore end up using uninitialized registers. This tracks write-on-write of fixed GRFs like we do for normal virtual GRFs. v2: Fix post_alloc_reg case (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9667>	2021-03-17 23:25:02 +00:00
Marcin Ślusarz	97c3ec6116	intel/compiler: cache computed register pressure benefit This halves the number of calls to get_register_pressure_benefit and decreases shader-db CPU time by ~1.5%. Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8741>	2021-01-29 11:31:39 +00:00
Jason Ekstrand	f9d549b2bf	intel/fs: Use BRW_OPCODE_HALT for discards We're about to start using it to implement nir_jump_halt which has nothing inherently to do with fragment shaders or discards. May as well name it for the HW instruction it generates. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:19:08 -06:00
Jason Ekstrand	e76e359007	intel/fs: Rename PLACEHOLDER_HALT to HALT_TARGET It's a bit more explicit and will play more nicely with what we're about to do. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:18:50 -06:00
Jason Ekstrand	75209d5bd1	intel/fs: Add and implement intel-specific ray-tracing intrinsics Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:10 +00:00
Jason Ekstrand	7280b0911d	intel/compiler: Add support for bindless shaders The Intel bindless thread dispatch model is very simple. When a compute shader is to be used for bindless dispatch, it can request a set of stack IDs. These are allocated per-dual-subslice by the hardware and recycled automatically when the stack ID is returned. Passed to the bindless dispatch are a global argument address, a stack ID, and an address of the BINDLESS_SHADER_RECORD to invoke. When the bindless shader is dispatched, it is passed its stack ID as well as the global and local argument pointers. The local argument pointer is the address of the BINDLESS_SHADER_RECORD plus some offset which is specified as part of the BINDLESS_SHADER_RECORD. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:09 +00:00
Caio Marcelo de Oliveira Filho	d372abe397	intel/fs: Add surface OWORD BLOCK opcodes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho	d3d2b73fa3	intel/fs: Add A64 OWORD BLOCK opcodes Based on a patch for OWORD BLOCK READ from Jason Ekstrand. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Jason Ekstrand	5abac85177	intel/fs: Rework scratch handling on Gen9+ The current scratch mechanism uses an MRF hack where we reserve a few GRF registers to treat like the MRF and we collect the data into that MRF region before doing a scratch write. We also use that region for the header for scratch reads. This commit changes things and gets rid of the MRF hack. Instead, we reserve a single register (which RA is free to pick) for the scratch header and uses split sends for scratch writes to avoid having to do the copy. This should provide RA with more freedom in the presence of spilling as well as avoid some unnecessary data moves. In future, the new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own per-thread scratch base address calculations rather than depending on the scratch base address that gets pushed into g0. Having an opcode for this lets us do it once at the top of the shader rather than repeating it at every read/write. One other noticeable difference is the use of SHADER_OPCODE_SEND. We can get away with this thanks to the fact that we're now using a set to track which instructions are generated by spills and don't rely on the opcodes to find spill/fill instructions. This allows us to avoid adding more virtual opcodes and let the normal code paths handle things like scoreboard dependencies between header setup and the SEND. It also means that post-RA scheduling may be able to space out the header setup MOV and the SEND for better latency hiding. Shader-db results on Skylake: total spills in shared programs: 12137 -> 10604 (-12.63%) spills in affected programs: 6685 -> 5152 (-22.93%) helped: 274 HURT: 2 total fills in shared programs: 13065 -> 11515 (-11.86%) fills in affected programs: 9007 -> 7457 (-17.21%) helped: 275 HURT: 1 Shader-db results on Ice Lake: total spills in shared programs: 12482 -> 10953 (-12.25%) spills in affected programs: 6586 -> 5057 (-23.22%) helped: 275 HURT: 0 total fills in shared programs: 12819 -> 11234 (-12.36%) fills in affected programs: 7867 -> 6282 (-20.15%) helped: 274 HURT: 0 Shader-db results on Tigerlake: total spills in shared programs: 11689 -> 10233 (-12.46%) spills in affected programs: 4740 -> 3284 (-30.72%) helped: 259 HURT: 0 total fills in shared programs: 10840 -> 9443 (-12.89%) fills in affected programs: 6244 -> 4847 (-22.37%) helped: 259 HURT: 0 Fossil-db results on Ice Lake: Spills in all programs: 245249 -> 201633 (-17.8%) Fills in all programs: 366066 -> 314368 (-14.1%) More practically, this seems to give about a 0.5-1% perf boost in Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>	2020-10-13 21:59:27 +00:00
Marcin Ślusarz	5ea0b6a9c6	intel/compiler: initialize remaining fields of various classes These variables seem to be initialized before being used, so this patch is not fixing any bug, but leaving them unitialized may become a bug after some refactoring. These classes were affected: fs_reg_alloc, fs_visitor, fs_generator, instruction_scheduler. Found by Coverity. Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>	2020-09-10 12:16:58 +00:00
Francisco Jerez	5e2a7e11b4	intel/ir: Remove scheduling-based cycle count estimates. The cycle count estimation logic part of the scheduler is now redundant with the shader performance modeling pass, and the estimates can be consolidated into the brw::performance analysis result object instead of being part of the CFG, which guarantees that the estimates cannot be accessed without previously calling the performance_analysis::require() method, which makes sure that the right analysis pass is executed at the right time if we don't already have up-to-date cached results. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00

1 2

80 commits