fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-22 19:38:10 +02:00

Author	SHA1	Message	Date
Dave Airlie	d54c07b4c4	mesa/*: use an internal enum for tessellation primitive types. To avoid dragging gl.h into places it has no business being, defined tessellation primitive mode to an enum. This has a lot of fallout all over the place. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Emma Anholt	f6ffefba3e	nir: Apply nir_opt_offsets to nir_intrinsic_load_uniform as well. Doing this for ir3 required adding a struct for limits of how much base to fold in (which NTT wants as well for its case of shared vars), otherwise the later work to lower to the 1<<9 word limit would emit more instructions. The shader-db results are that sometimes the reduction in NIR instruction count results in the fewer sampler prefetches due to the shader being estimated to be shorter (dota2, nexuiz): total instructions in shared programs: 8996651 -> 8996776 (<.01%) total cat5 in shared programs: 86561 -> 86577 (0.02%) Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>	2022-01-16 19:11:29 +00:00
Emma Anholt	b024102d7c	freedreno/ir3: Use nir_opt_offset for removing constant adds for shared vars. Saves some work in carchase and manhattan31: instructions in affected programs: 2842 -> 2818 (-0.84%) nops in affected programs: 1131 -> 1105 (-2.30%) non-nops in affected programs: 1236 -> 1238 (0.16%) mov in affected programs: 57 -> 61 (7.02%) dwords in affected programs: 2144 -> 2150 (0.28%) cat0 in affected programs: 1195 -> 1169 (-2.18%) cat1 in affected programs: 151 -> 155 (2.65%) cat2 in affected programs: 142 -> 140 (-1.41%) sstall in affected programs: 190 -> 178 (-6.32%) (ss) in affected programs: 63 -> 63 (0.00%) systall in affected programs: 532 -> 511 (-3.95%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>	2022-01-16 19:11:29 +00:00
Danylo Piliaiev	d77bfc117c	tu,ir3: Implement VK_KHR_shader_integer_dot_product - gen4 - has dp4acc and dp2acc, dp4acc is used to implement 4x8 dot product. - gen3 - has dp2acc, in OpenCL blob uses dp2acc for dot product on both get3 and gen4. - gen2 - unknown, lower everything. - gen1 - no dp2acc, lower everything. OpenCL blob doesn't advertise cl_qcom_dot_product8 but still generates code for it. The assembly is more verbose and uses yet to be documented mad32.u16 instruction. Passes: dEQP-VK.spirv_assembly.instruction.compute.opsdotkhr.* dEQP-VK.spirv_assembly.instruction.compute.opudotkhr.* dEQP-VK.spirv_assembly.instruction.compute.opsudotkhr.* dEQP-VK.spirv_assembly.instruction.compute.opsdotaccsatkhr.* dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.* dEQP-VK.spirv_assembly.instruction.compute.opsudotaccsatkhr.* Only packed 4x8 unsigned and mixed versions are accelerated. However in theory we should be able to do better for signed version than current NIR lowering. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:21:24 +02:00
Danylo Piliaiev	e1f89a1da2	ir3: Make nir compiler options a part of ir3_compiler This would allow for sub-gens to have different options. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:20:39 +02:00
Danylo Piliaiev	c1d5c318bc	ir3: New cat3 instructions * shrm - (src2 >> src1) & src3 * shlm - (src2 << src1) & src3 * shrg - (src2 >> src1) \| src3 * shlg - (src2 << src1) \| src3 * andg - (src2 & src1) \| src3 * dp2acc - dot product of two {i,u}8vec2 packed into SRC1 and SRC2, added to 32b SRC3 * dp4acc - dot product of two {i,u}8vec4 packed into SRC1 and SRC2, added to 32b SRC3 * wmm - vec4(x_1, x_2, x_3, x_4) * (y_1 + y_2 + y_3 + y_4), which is duplicated (1 << (SRC3 / 32)) times starting from DST register * wmm.accu - same as wmm but result is added to DST registers, however the first reg in each vec4 result is overwritten instead of accumulating. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:20:39 +02:00
Connor Abbott	1a1e25dcce	tu, ir3: Support runtime gl_SubgroupSize in FS We already supported it in the CS for computing the subgroup ID, but soon we'll need it in the FS too. Vertex stages will always have it lowered. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13960>	2022-01-10 10:58:28 +00:00
Connor Abbott	e6e34883a9	ir3: Add wavesize control This allows the wavesize to be controlled per-shader. This will be used by VK_EXT_subgroup_size_control, and freedreno will also need it if legacy ARB_shader_ballot is to be supported (since it forces a wavesize of 64 or less). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13960>	2022-01-10 10:58:28 +00:00
Connor Abbott	30237b3d9c	ir3: Pass shader to ir3_nir_post_finalize() We'll need to add shader-specific lowering for gl_SubgroupSize. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13960>	2022-01-10 10:58:28 +00:00
Connor Abbott	9ebc48005c	ir3, freedreno: Add options struct for ir3_shader_from_nir() We'll expand this in a moment. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13960>	2022-01-10 10:58:28 +00:00
Danylo Piliaiev	3792fbfcf6	ir3: Assert that we cannot have enough concurrent waves for CS with barrier If we have a compute shader that has a big workgroup, a barrier, and a branchstack which limits max_waves - this may result in a situation when we cannot run concurrently all waves of the workgroup, which would lead to a hang. Blob just explodes in such case. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14110>	2022-01-07 18:40:15 +00:00
Danylo Piliaiev	9ed4d49c97	ir3: Be able to reduce register limit for RA when CS has barriers If barriers are used, it must be possible for all waves in the workgroup to execute concurrently. Thus we may have to reduce the registers limit. Fixes a hang in "Digital Combat Simulator". Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14110>	2022-01-07 18:40:15 +00:00
Connor Abbott	cb45120556	ir3: Use (ss) for instructions writing shared regs The blob uses both nops and (ss). It turns out that in some rare cases the hardware does take more than 6 cycles, at least for movmsk, but adding nops is unnecessary. I believe the extra nops are only there due to the immaturity of the blob's implementation of subgroup ops, so we don't have to copy them - just handle shared reg producers the same as SFU instructions. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>	2022-01-07 14:26:08 +00:00
Connor Abbott	d45678cac4	ir3/postsched: Rename tex/sfu to sy/ss Analogous to the previous commit. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>	2022-01-07 14:26:08 +00:00
Connor Abbott	e6b35d606d	ir3/sched: Rename tex/sfu to sy/ss This now covers e.g. cat6 instructions as well, and ss will cover instructions writing shared regs as well. This is split out from the previous change to avoid too much churn and shouldn't cause any functional changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>	2022-01-07 14:26:08 +00:00
Connor Abbott	0cc4aca345	ir3: Use new (sy)/(ss) stall helpers in the compiler This fixes a few bad assumptions in the pre-RA and post-RA scheduler, for example that (sy) is only for texture instructions and (ss) is only for SFU instructions and (sy) and (ss) producers will always take the same number of cycles. This means we now start doing latency hiding for cat6 instructions like ldib and ldc. It also should make us hide latency more aggressively, since the number used for (sy) stall cycles was way lower than the real numbers for everything except ldc. Finally it unifies the various places (ss) soft nops were calculated. selected shader-db results: total nops in shared programs: 345278 -> 358959 (3.96%) nops in affected programs: 215622 -> 229303 (6.34%) helped: 690 HURT: 2430 helped stats (abs) min: 1 max: 125 x̄: 11.40 x̃: 5 helped stats (rel) min: 0.53% max: 100.00% x̄: 24.19% x̃: 18.52% HURT stats (abs) min: 1 max: 501 x̄: 8.87 x̃: 5 HURT stats (rel) min: 0.00% max: 9900.00% x̄: 52.36% x̃: 14.29% 95% mean confidence interval for nops value: 3.78 4.99 95% mean confidence interval for nops %-change: 28.21% 42.66% Nops are HURT. total mov in shared programs: 75049 -> 74110 (-1.25%) mov in affected programs: 15754 -> 14815 (-5.96%) helped: 566 HURT: 455 helped stats (abs) min: 1 max: 36 x̄: 4.52 x̃: 3 helped stats (rel) min: 0.83% max: 100.00% x̄: 35.85% x̃: 30.00% HURT stats (abs) min: 1 max: 35 x̄: 3.55 x̃: 3 HURT stats (rel) min: 0.00% max: 1100.00% x̄: 63.60% x̃: 25.00% 95% mean confidence interval for mov value: -1.25 -0.58 95% mean confidence interval for mov %-change: 2.92% 14.02% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). total last-baryf in shared programs: 80468 -> 67670 (-15.90%) last-baryf in affected programs: 63676 -> 50878 (-20.10%) helped: 309 HURT: 147 helped stats (abs) min: 1 max: 260 x̄: 49.20 x̃: 24 helped stats (rel) min: 0.60% max: 98.81% x̄: 37.92% x̃: 40.91% HURT stats (abs) min: 1 max: 115 x̄: 16.35 x̃: 12 HURT stats (rel) min: 0.96% max: 1933.33% x̄: 45.55% x̃: 7.89% 95% mean confidence interval for last-baryf value: -33.03 -23.10 95% mean confidence interval for last-baryf %-change: -21.52% -0.50% Last-baryf are helped. total sstall in shared programs: 133997 -> 126398 (-5.67%) sstall in affected programs: 86866 -> 79267 (-8.75%) helped: 1893 HURT: 598 helped stats (abs) min: 1 max: 77 x̄: 6.06 x̃: 4 helped stats (rel) min: 0.71% max: 100.00% x̄: 32.82% x̃: 16.67% HURT stats (abs) min: 1 max: 65 x̄: 6.47 x̃: 6 HURT stats (rel) min: 0.00% max: 900.00% x̄: 65.51% x̃: 25.00% 95% mean confidence interval for sstall value: -3.39 -2.71 95% mean confidence interval for sstall %-change: -12.19% -6.24% Sstall are helped. total systall in shared programs: 350304 -> 288234 (-17.72%) systall in affected programs: 234855 -> 172785 (-26.43%) helped: 1456 HURT: 260 helped stats (abs) min: 1 max: 574 x̄: 46.42 x̃: 27 helped stats (rel) min: 0.19% max: 100.00% x̄: 39.43% x̃: 36.06% HURT stats (abs) min: 1 max: 757 x̄: 21.20 x̃: 8 HURT stats (rel) min: 0.00% max: 180.95% x̄: 24.82% x̃: 12.50% 95% mean confidence interval for systall value: -39.31 -33.03 95% mean confidence interval for systall %-change: -31.49% -27.90% Systall are helped. total waves in shared programs: 236732 -> 235142 (-0.67%) waves in affected programs: 6142 -> 4552 (-25.89%) helped: 535 HURT: 17 helped stats (abs) min: 2 max: 8 x̄: 3.08 x̃: 2 helped stats (rel) min: 12.50% max: 75.00% x̄: 28.78% x̃: 25.00% HURT stats (abs) min: 2 max: 6 x̄: 3.53 x̃: 4 HURT stats (rel) min: 16.67% max: 75.00% x̄: 37.35% x̃: 33.33% 95% mean confidence interval for waves value: -3.04 -2.72 95% mean confidence interval for waves %-change: -28.10% -25.39% Waves are helped. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>	2022-01-07 14:26:08 +00:00
Connor Abbott	7e60978d30	ir3: Introduce systall metric and new helper functions Add new centralized functions which will replace the various places we hardcode 10 for the number of (ss) nops, add numbers for soft (sy) nops based on similar computerator experiments with ldc, sam, and ldib (the most common (sy) producers), and add a "systall" metric which is analogous to sstall. This also fixes some cases where we'd erroniously count ldl* as (sy) producers instead of (ss) producers when calculating sstall. This only switches over the metric reporting to the new functions, so there is no behavior change. The following commit will switch over the rest of the compiler. While we're at it, remove max_sun as it's never set. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>	2022-01-07 14:26:08 +00:00
Connor Abbott	603791bdeb	ir3: Bump type mismatch penalty to 3 After some experimentation with computerator, it seems on a618 that writing a full register and then reading half of it as a half register requires a delay of 6, the same as the delay for cat5/cat6 sources. The other direction only has a delay of 5, but just bump it unconditionally out of an abundance of caution. Fixes: `890de1a436` ("ir3/delay: Fix full->half and half->full delay") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>	2022-01-07 14:26:08 +00:00
Connor Abbott	d371d807eb	ir3/ra: Fix logic bug in compress_regs_left If we're allocating a source then we force is_killed to false, not to true. Fixes a regression in dEQP-GLES31.functional.synchronization.in_invocation.image_atomic_write_read later. Fixes: `0ffcb19b9d` ("ir3: Rewrite register allocation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>	2022-01-07 14:26:08 +00:00
Rob Clark	8a21b2fda0	freedreno/ir3: Dump const state with shader disasm Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14231>	2021-12-20 19:47:35 +00:00
Rob Clark	d1edc6d9a1	freedreno/computerator: Fix @buf header Order is important in the grammar, the more specific match needs to go first. Fixes: `ba1c989348` ("freedreno/computerator: pass iova of buffer to const register") Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14231>	2021-12-20 19:47:35 +00:00
Rob Clark	78c53f4888	freedreno/ir3: Handle instr->address when cloning Without this, a cloned instruction that takes full regs will trigger an ir3_validate assert. This can happen, for ex, if an instruction that writes p0.x and has a relative src gets cloned in ir3_sched. Fixes an assert in Genshin Impact with a debug build. Fixes: `9af795d9b9` ("ir3: Make ir3_instruction::address a normal register") Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14231>	2021-12-20 19:47:35 +00:00
Danylo Piliaiev	c749da6135	ir3,turnip: Add support for GL_KHR_shader_subgroup_quad Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13817>	2021-12-07 20:45:53 +00:00
Danylo Piliaiev	3dfd4230bb	ir3,turnip: Enable subgroup ops support in all stages on gen4 Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13817>	2021-12-07 20:45:53 +00:00
Danylo Piliaiev	ded51fd39e	ir3: Use getfiberid for SubgroupInvocationID on gen4 Since it requires (ss) categorize it as is_sfu() and not is_mem(). Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13817>	2021-12-07 20:45:53 +00:00
Danylo Piliaiev	d1c49901df	ir3: Add gen4 new subgroup instructions * getlast.w8 #4 - Perform jump for the first (CLUSTER_SIZE-1) fibers in a subgroup * brcst.active.w8 - necessary to implement arithmetic subgroup operations with prefix sum. * quad_shuffle.brcst - subgroupQuadBroadcast * quad_shuffle.horiz - subgroupQuadSwapHorizontal * quad_shuffle.vert - subgroupQuadSwapVertical * quad_shuffle.diag - subgroupQuadSwapDiagonal * getfiberid - gl_SubgroupID Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13817>	2021-12-07 20:45:53 +00:00
Emma Anholt	3748b8afce	freedreno/ir3: Make a shared helper for the tess factor stride. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6089>	2021-12-01 16:57:30 +00:00
Ilia Mirkin	f533d7a446	freedreno/ir3: get the post-lowering clip/cull mask The variant may include a lowered gl_Clip/CullDistance array. So we have to use the variant's info (which is not available). However we save off the clip/cull masks already, so just reuse those. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13891>	2021-11-28 02:55:58 -05:00
Ilia Mirkin	13fb587b8a	freedreno/ir3: indicate that clipdist arrays are in use We expose the compact array cap, which means that we get compact clipdist arrays. Indicate this to the lowering pass so that it works for gl_ClipDistance from fs, among others. Fixes, among others, on a420, tests/spec/glsl-1.30/execution/clipping/fs-clip-distance-interpolated.shader_test Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13891>	2021-11-28 02:55:58 -05:00
Danylo Piliaiev	a78c36ecc6	ir3/cp: Prevent setting an address on subgroup macros These macros expand to a mov in an if statement which breaks address assumption that instruction which produces address and consumes it are in the same block. Fixes test: dEQP-VK.subgroups.ballot_broadcast.framebuffer.subgroupbroadcast_intvertex Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13931>	2021-11-25 12:18:48 +00:00
Connor Abbott	969369e962	ir3/lower_subgroups: Fix potential infinite loop I was trying to be clever here, skipping ahead to the newly-created block and processing the remaining instructions after the split in the same loop. But if the last instruction in a block was lowered, the saved next instruction would be the head of the block before the split, not the new block, and we would compare it to the new block so we wouldn't stop like we were supposed to. Stop being so clever, and just restart processing with the new block after lowering an instruction. Because we're wrapping the actual transform in yet another loop, and the restarting logic is a bit tricky, refactor the actual lowering into a separate lower_instr function. Otherwise we'd be mixing the two and indenting the actual logic even more. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13928>	2021-11-25 10:16:48 +00:00
Danylo Piliaiev	99388f0c27	freedreno/ir3: handle global atomics Only for a6xx since we don't know the instructions for global atomics on previous gens. Per Qualcomm's docs in OpenCL atomics are only supported since a5xx together with Generic memory space. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>	2021-11-23 18:26:37 +00:00
Danylo Piliaiev	5d5b1fc472	freedreno/ir3: add a6xx global atomics and separate atomic opcodes Separating atomic opcodes makes possible to express a6xx global atomics which take iova in SRC1. They would be needed by VK_KHR_buffer_device_address. The change also makes easier to distiguish atomics in conditions. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>	2021-11-23 18:26:37 +00:00
Ilia Mirkin	be048ec112	freedreno/ir3: remove unused actual_in counting Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13918>	2021-11-23 17:20:32 +00:00
Danylo Piliaiev	ed16eedb2d	ir3: print half-dst/src for ldib.b/stib.b So it would print: ldib.b.untyped.1d.u16.1.imm.base0 hr0.z, r0.x, 0 instead of: ldib.b.untyped.1d.u16.1.imm.base0 r0.z, r0.x, 0 Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13876>	2021-11-22 12:32:15 +00:00
Connor Abbott	c98adc56f4	ir3/lower_pcopy: Fix bug with "illegal" copies and swaps If the source and destination were within the same full register, like hr90.x and hr90.y (which both map to r45.x), then we'd perform the swap/copy with the wrong register. This broke dEQP-VK.ssbo.phys.layout.random.16bit.scalar.35 once BDA is enabled. Fixes: `0ffcb19b9d` ("ir3: Rewrite register allocation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>	2021-11-19 16:59:54 +00:00
Connor Abbott	65da866ad9	ir3/lower_pcopy: Fix shr.b illegal copy lowering The immediate shouldn't be half-reg because the other source isn't. Fixes an assertion failure with dEQP-VK.ssbo.phys.layout.random.16bit.scalar.35. Fixes: `0ffcb19b9d` ("ir3: Rewrite register allocation") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>	2021-11-19 16:59:54 +00:00
Connor Abbott	9912c61362	ir3/spill: Support larger spill slot offset This is required by dEQP-VK.ssbo.phys.layout.random.all_shared_buffer.47, where we need to spill a lot of pointers due to NIR CSE being a little too aggressive and creating a large register pressure across basic blocks, too large to fit within the boundaries of ldp/stp offsets. Note that this will be a lot more difficult with support for "real functions" because the base register will become unknown at compile time. However this hack gets things working for the time being. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>	2021-11-19 16:59:54 +00:00
Connor Abbott	29d3889bbb	ir3/ra: Add missing asserts to ra_push_interval() This would've caught the previous issue earlier. We checked that the physreg made sense when inserting via ra_file_insert() but not ra_push_interval() which is used for live-range splitting. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>	2021-11-19 16:59:54 +00:00
Connor Abbott	9d88b98b08	ir3/ra: Consider reg file size when swapping killed sources Don't swap a 2-component vector of half-regs with a full reg if that would result in the half regs going outside of the allowable half-reg space. Fixes: `d4b5d2a020` ("ir3/ra: Use killed sources in register eviction") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13818>	2021-11-19 16:59:54 +00:00
Connor Abbott	23a5f1a5ac	ir3: Stop inserting nops during scheduling Not necessary since nothing uses it anymore. This might have a slight effect on spilling with multiple blocks, but no shader-db difference because nothing spills. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Connor Abbott	e0eeba6cbb	ir3/postsched: Only prefer tex/sfu if they are soft-ready Otherwise we schedule an SFU depending on a tex as soon as the tex is scheduled, which is very much not what we want. Note that sstall is helped more than nops are hurt, and the shaders with the largest nop regressions also have sstall helped. However (sy) is also very much helped. total nops in shared programs: 345482 -> 345986 (0.15%) nops in affected programs: 5731 -> 6235 (8.79%) helped: 15 HURT: 81 helped stats (abs) min: 1 max: 9 x̄: 3.27 x̃: 3 helped stats (rel) min: 0.50% max: 16.00% x̄: 8.55% x̃: 10.26% HURT stats (abs) min: 1 max: 72 x̄: 6.83 x̃: 4 HURT stats (rel) min: 0.57% max: 400.00% x̄: 32.50% x̃: 13.16% 95% mean confidence interval for nops value: 3.34 7.16 95% mean confidence interval for nops %-change: 13.07% 39.10% Nops are HURT. total sstall in shared programs: 133804 -> 132381 (-1.06%) sstall in affected programs: 4743 -> 3320 (-30.00%) helped: 68 HURT: 24 helped stats (abs) min: 1 max: 153 x̄: 21.88 x̃: 8 helped stats (rel) min: 1.79% max: 100.00% x̄: 33.20% x̃: 28.00% HURT stats (abs) min: 1 max: 11 x̄: 2.71 x̃: 2 HURT stats (rel) min: 1.02% max: 200.00% x̄: 17.73% x̃: 5.59% 95% mean confidence interval for sstall value: -22.05 -8.89 95% mean confidence interval for sstall %-change: -27.60% -12.22% Sstall are helped. total (ss) in shared programs: 35471 -> 35481 (0.03%) (ss) in affected programs: 462 -> 472 (2.16%) helped: 9 HURT: 15 helped stats (abs) min: 1 max: 2 x̄: 1.11 x̃: 1 helped stats (rel) min: 4.17% max: 33.33% x̄: 14.00% x̃: 7.69% HURT stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1 HURT stats (rel) min: 1.19% max: 50.00% x̄: 12.27% x̃: 8.33% 95% mean confidence interval for (ss) value: -0.14 0.97 95% mean confidence interval for (ss) %-change: -5.11% 9.94% Inconclusive result (value mean confidence interval includes 0). total (sy) in shared programs: 13522 -> 13288 (-1.73%) (sy) in affected programs: 422 -> 188 (-55.45%) helped: 22 HURT: 1 helped stats (abs) min: 1 max: 21 x̄: 10.68 x̃: 10 helped stats (rel) min: 8.00% max: 94.44% x̄: 56.58% x̃: 56.94% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% 95% mean confidence interval for (sy) value: -13.18 -7.17 95% mean confidence interval for (sy) %-change: -65.48% -40.59% (sy) are helped. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Connor Abbott	6f5c0d209c	ir3/postsched: Rewrite delay handling Analogous to the pre-RA scheduler. Unfortunately this time it's a bit more involved because we have to correctly handle (rptN), which is already relevant for swz. This means we need the index of the destination register that conflicts with the source register, to handle swz, and we need to expose that part of ir3_delay. But once that's done, we can delete ir3_delay_calc_postra. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Connor Abbott	140e117f2b	ir3/delay: Ignore earlier definitions to the same register We have a situation in some skia shaders like: add.f r0.x, ... (rpt2)nop mul.f ..., r0.x sam (xyzw) r0.x, ... rcp ..., r0.x Notice that rcp uses the result of the sam instruction, not the add.f, but we didn't keep track of which instructions kill the sources in ir3_delay, so we'd add an extra nop, resulting in a disagreement betwen ir3_delay and the scheduling graph. Since postsched is correct, fix ir3_delay. This only results in some very slight shader-db changes but keeps the next commit from changing things. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Connor Abbott	a54e7baa65	ir3/postsched: Handle sync dependencies better We want to model soft dependencies, but because of how there's only a single bit to wait on all of them, there may be unnecessary delays inserted when a (sy)-consumer follows an unrelated (sy)-producer. Previously there was some code to try to work around this, but we can just model it directly using the sfu_delay and tex_delay cycle counts that we have to maintain anyway and delete it. This also gets rid of the calls to ir3_delay_postra with soft=true which would be more complicated to handle in the next commit. There is a functional change here: the idea of preferring less nop's over critical path length (max_delay) up to 3 nops is kept (and we delete the TODO which is already sort-of resolved by it), but delays due to (ss)/(sy) and nops are now treated equally, rather than always preferring nops over syncs. So if our estimate indicates that scheduling an (ss) consumer will result in a wait of one cycle and there's another instruction that will require one nop, we will treat them otherwise equal and choose based on max_delay instead. This results in more sstall, but the decrease in nops is much greater. total nops in shared programs: 376613 -> 345482 (-8.27%) nops in affected programs: 275483 -> 244352 (-11.30%) helped: 3226 HURT: 110 helped stats (abs) min: 1 max: 78 x̄: 9.73 x̃: 7 helped stats (rel) min: 0.19% max: 100.00% x̄: 19.48% x̃: 13.68% HURT stats (abs) min: 1 max: 16 x̄: 2.43 x̃: 2 HURT stats (rel) min: 0.00% max: 150.00% x̄: 13.34% x̃: 4.36% 95% mean confidence interval for nops value: -9.61 -9.06 95% mean confidence interval for nops %-change: -19.01% -17.78% Nops are helped. total sstall in shared programs: 126195 -> 133806 (6.03%) sstall in affected programs: 79440 -> 87051 (9.58%) helped: 300 HURT: 1922 helped stats (abs) min: 1 max: 15 x̄: 4.72 x̃: 4 helped stats (rel) min: 1.05% max: 100.00% x̄: 17.15% x̃: 14.55% HURT stats (abs) min: 1 max: 29 x̄: 4.70 x̃: 4 HURT stats (rel) min: 0.00% max: 900.00% x̄: 25.38% x̃: 10.53% 95% mean confidence interval for sstall value: 3.22 3.63 95% mean confidence interval for sstall %-change: 17.50% 21.78% Sstall are HURT. total (ss) in shared programs: 35190 -> 35472 (0.80%) (ss) in affected programs: 6433 -> 6715 (4.38%) helped: 163 HURT: 401 helped stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1 helped stats (rel) min: 1.92% max: 33.33% x̄: 11.53% x̃: 10.00% HURT stats (abs) min: 1 max: 3 x̄: 1.13 x̃: 1 HURT stats (rel) min: 1.56% max: 100.00% x̄: 15.33% x̃: 12.50% 95% mean confidence interval for (ss) value: 0.41 0.59 95% mean confidence interval for (ss) %-change: 6.22% 8.93% (ss) are HURT. total (sy) in shared programs: 13476 -> 13521 (0.33%) (sy) in affected programs: 669 -> 714 (6.73%) helped: 30 HURT: 78 helped stats (abs) min: 1 max: 2 x̄: 1.13 x̃: 1 helped stats (rel) min: 4.00% max: 50.00% x̄: 21.22% x̃: 21.11% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 3.45% max: 100.00% x̄: 31.93% x̃: 25.00% 95% mean confidence interval for (sy) value: 0.23 0.60 95% mean confidence interval for (sy) %-change: 11.19% 23.15% (sy) are HURT. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Connor Abbott	b9f61d7287	ir3/postsched: Fix copy-paste mistake Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Connor Abbott	d9a91318b1	ir3/sched: Rewrite delay handling The old code walked the instructions between each ready instruction and each of its parents for every instruction, which can quickly become accidentally quadratic. Instead we keep track of the current "instruction pointer" of the to-be-scheduled instruction, and for each ready instruction calculate an "earliest possible IP" which is the IP that needs to be reached before we can schedule it. Because this stays constant as soon as an instruction becomes ready, we never have to recompute it and each call to ir3_delay_calc_prera() becomes a simple comparison and subtract. We only need to iterate over the children and update their earliest_ip when scheduling an instruction, and we already do that in util_day_prune_head() so it should be cheap. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Connor Abbott	508f917d8c	util/dag: Make edge data a uintptr_t Nobody was actually using it as a pointer, and I'm going to introduce a shared function which relies on it not being a pointer so let's fix this once and for all. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Ilia Mirkin	aa93896156	freedreno/ir3: adjust condition for when to use ldib We have to use it any time that the image is writable. Otherwise writes from the same invocation won't have posted into the texture cache. See: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5629 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13807>	2021-11-16 18:22:29 +00:00
Ilia Mirkin	8c9a86cb57	freedreno/ir3: fix image-to-tex flags, remove 3d -> array hack The function would return both the 3d and array flags set for 2d array, and would return just 3d for cubes. Fix the flags so that they are appropriate for images. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13804>	2021-11-16 00:33:31 +00:00

1 2 3 4 5 ...

1151 commits