fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-27 23:08:12 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	4c2d9729df	iris: Tidy BO sizing code and comments Buckets haven't been power of two sized in over a decade. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:42:15 -07:00
Kenneth Graunke	7acc88a47c	iris: Move some field setting after we drop the lock. It's not much, but we may as well hold the lock for a bit less time. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:42:04 -07:00
Kenneth Graunke	76c5a19668	iris: Move cached BO allocation into a helper function. There's enough going on here to warrant a helper. This also simplifies the control flow and eliminates the last non-error-case goto. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:52 -07:00
Kenneth Graunke	cea6671395	iris: Fall back to fresh allocations of mapping for zero-memset fails. It is unlikely that we would fail to map a cached BO in order to zero its contents. When we did, we would free the first BO in the cache and try again with the second. It's possible that this next BO already had a map setup, in which case we'd succeed. But if it didn't, we'd likely fail again in the same manner. There's not much point in optimizing this case (and frankly, if we're out of CPU-side VMA we should probably dump the cache entirely)...so instead, just fall back to allocating a fresh BO from the kernel which will already be zeroed so we don't have to try and map it. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:50 -07:00
Kenneth Graunke	042f8514e6	iris: Move fresh BO allocation into a helper function. There's enough going on here to warrant a helper. More cleaning coming. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:22 -07:00
Kenneth Graunke	06421e5be7	iris: Do SET_TILING at a single point rather than in two places. Both the from-cache and fresh-from-GEM cases were calling SET_TILING. In the cached case, we would retry the allocation on failure, pitching one BO from the cache each time. This is silly, because the only time it should fail is if the tiling or stride parameters are unacceptable, which has nothing to do with the particular BO in question. So there's no point in retrying - we should simply fail the allocation. This patch moves both calls to bo_set_tiling_internal() below the cache/fresh split, so we have it at a single point in time instead of two. To preserve the ordering between SET_TILING and SET_DOMAIN, we move that below as well. (I am unsure if the order matters.) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:08 -07:00
Kenneth Graunke	43d835cb0f	iris: Use the BO cache even for coherent buffers on non-LLC. We mark snooped BOs as non-reusable, so we never return them to the cache. This means that we'd need to call I915_GEM_SET_CACHING to make any BO we find in the cache snooped. But then again, any BO we freshly allocate from the kernel will also be non-snooped, so it has the same issue. There's really no reason to skip the cache - we may as well use it to avoid the I915_GEM_CREATE overhead. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:18 -07:00
Kenneth Graunke	78003014d0	iris: Fix locking around vma_alloc in iris_bo_create_userptr util_vma needs to be protected by a lock. All other callers of vma_alloc and vma_free appear to be holding a lock already. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:16 -07:00
Kenneth Graunke	5fc11fd988	iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation. The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock. We can simply set the bucket to NULL and it will skip the cache without goto, and without messing up locking. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:15 -07:00
Kenneth Graunke	0f1b68ebee	iris: Re-emit Surface State Base Address when context is lost. When we hit a GPU hang, we failed to reset Surface State Base Address right away, and would keep hanging until we filled up the binder. Then we'd finally get it right after a lot of repeated stumbles. Update it right away so we hopefully hang fewer times before succeeding.	2019-05-29 16:35:02 -07:00
Jason Ekstrand	e459d6d6df	iris: Enable nir_opt_large_constants Shader-db results on Kaby Lake: total instructions in shared programs: 15306230 -> 15304726 (<.01%) instructions in affected programs: 4570 -> 3066 (-32.91%) helped: 16 HURT: 0 total cycles in shared programs: 361703436 -> 361680041 (<.01%) cycles in affected programs: 129388 -> 105993 (-18.08%) helped: 16 HURT: 0 LOST: 0 GAINED: 2 The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal Space Program Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Jason Ekstrand	9dc57eebd5	iris: Don't assume UBO indices are constant It will be true for the constant/system value buffer because they use a constant zero but it's not true in general. If we ever got here when the source wasn't constant, nir_src_as_uint would assert. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2019-05-29 21:09:16 +00:00
Jason Ekstrand	744f93f5c1	iris: Move upload_ubo_ssbo_surf_state to iris_program.c Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Kenneth Graunke	6892d2b94a	iris: Clone before calling nir_strip and serializing This is non-destructive and leaves the debugging information in place. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-29 18:16:32 +00:00
Kenneth Graunke	e1409aead5	iris: Only store the SHA1 of the NIR in iris_uncompiled_shader Jason pointed out that we don't need to keep an entire copy of the serialized NIR around, we just need the SHA1. This does change our disk cache key to be taking a SHA1 of a SHA1, which is a bit odd, but should work out and be faster and use less memory. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-29 18:16:32 +00:00
Kenneth Graunke	b5fa3abfc2	iris: Don't flag IRIS_DIRTY_URB after BLORP operations unless it changed We already flag IRIS_DIRTY_URB when we change it, but we were additionally flagging it on every BLORP operation, even if we didn't.	2019-05-26 17:45:18 -07:00
Kenneth Graunke	25afbb04c2	iris: Advertise coherent framebuffer fetches This lets us advertise GL_EXT_shader_framebuffer_fetch and GL_KHR_blend_equation_advanced_coherent support.	2019-05-23 08:13:10 -07:00
Kenneth Graunke	a2d7834457	gallium: Change PIPE_CAP_TGSI_FS_FBFETCH bool to PIPE_CAP_FBFETCH count TGSI's FBFETCH instruction currently only supports reading from a single render target, but NIR intrinsics can support multiple render targets. radeonsi can only support fetching from RT 0, but other drivers may be able to support fetching from any render target. To express this, this patch renames PIPE_CAP_TGSI_FS_FBFETCH to simply PIPE_CAP_FBFETCH, and converts it from a boolean "is FBFETCH supported?" to an integer number of render targets which can be fetched. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-23 08:13:07 -07:00
Kenneth Graunke	7d2b54e393	iris: Record state sizes for INTEL_DEBUG=bat decoding. Felix noticed a crash when using INTEL_DEBUG=bat decoding. It turned out that we were sometimes placing variable length data near the end of a buffer, and with the decoder guessing random lengths rather than having an actual count, it was walking off the end and crashing. So this does more than improve the decoder output. Unfortunately, this is a bit more complicated than i965's handling, because we don't have a single state buffer. Various places upload data via u_upload_mgr, and so there isn't a central place to record the size. We don't need to catch every single place, however, since it's only important to record variable length packets (like viewports and binding tables). State data also lives arbitrarily long, rather than being discarded on every batch like i965, so we don't know when to clear out old entries either. (We also don't have a callback when an upload buffer is released.) So, this tracking may space leak over time. That's probably okay though, as this is only a debugging feature and it's a slow leak. We may also get lucky and overwrite existing entries as we reuse BOs, though I find this unlikely to happen. The fact that the decoder works in terms of offsets from a state base address is also not ideal, as dynamic state base address and surface state base address differ for iris. However, because dynamic state addresses start from the top of a 4GB region, and binding tables start from addresses [0, 64K), it's highly unlikely that we'll get overlap. We can always improve this, but for now it's better than what we had.	2019-05-23 08:07:08 -07:00
Tapani Pälli	ed563b79df	iris: fix android build Fixes: `4756864cdc` ""iris: Start wiring up on-disk shader cache Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-22 14:01:41 +03:00
Kenneth Graunke	6dc1c2d8bd	iris: Fix ALT mode regressions from shader cache We were checking this based on nir->info.name, but with the shader cache enabled, nir_strip throws out the name, causing us to use IEEE mode for ARB programs. gl-1.0-spot-light regressed because it wants ALT mode for 0^0 behavior. Fixes: `dc5dc727d5` iris: Serialize the NIR to a blob we can use for shader cache purposes.	2019-05-21 16:58:54 -07:00
Kenneth Graunke	fb1d08dcfd	iris: Expose the disk cache to the state tracker as well. This lets st/nir cache the NIR for shaders, based on the shader source string hash, allowing us to skip initial compiles altogether, and also letting us start from there should we need to recompile for NOS. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Dylan Baker	601c9bc135	iris: Cache assembly shaders in the on-disk shader cache This implements storing and retrieving iris_compiled_shader objects from the on-disk shader cache. (by Dylan Baker and Kenneth Graunke)	2019-05-21 15:05:38 -07:00
Kenneth Graunke	dc5dc727d5	iris: Serialize the NIR to a blob we can use for shader cache purposes. We will use a hash of the serialized NIR together with brw_prog_*_key (for NOS) as the disk cache key, where the disk cache contains actual assembly shaders. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Dylan Baker	4756864cdc	iris: Start wiring up on-disk shader cache This creates the on-disk shader cache data structure, and handles the build-id keying aspects. The next commits will fill it out so it's actually used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-21 15:05:38 -07:00
Kenneth Graunke	6ae2caf201	iris: Move iris_uncompiled_shader definition to iris_context.h It had been internal to iris_program.c, but with the upcoming disk cache code, the "program module" is going to be spread across a couple source files. Into a header it goes! Now it lives alongside iris_compiled_shader, which makes sense. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Kenneth Graunke	752367b766	iris: Dodge more GLSL IR lowering This avoids some lower_instructions bits in st.	2019-05-15 19:44:21 -07:00
Andrii Kryvytskyi	eca53f00aa	iris: Check if resource has stencil before returning it Signed-off-by: Andrii Kryvytskyi <andrii.o.kryvytskyi@globallogic.com> Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 21:16:11 -07:00
Kenneth Graunke	bb5db02bab	iris: Enable fragment shader interlock on Gen9+. There's some debate about whether we should support this on older hardware as well. Currently i965 turns it off on Gen8- though, so we follow suit. If this changes, we can update this as well. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-14 19:34:33 -07:00
Kenneth Graunke	646924cfa1	intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8 Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:30 -07:00
Nanley Chery	e81392868e	iris/resource: Drop redundant checks for aux support Drop some checks that are already done by ISL. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	75a3947af4	iris/resource: Fall back to no aux if creation fails No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update iris. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Eric Anholt	0c31fe9ee7	gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE. The _LEVELS assumes that the max is always power of two. For V3D 4.2, we can support up to 7680 non-power-of-two MSAA textures, which will let X11 support dual 4k displays on newer hardware. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 12:03:08 -07:00
Illia Iorin	a35269cf44	iris: Implement ARB_indirect_parameters iris_draw_vbo is divided into two functions to remove unnecessary operations from the loop. This implementation of ARB_indirect_parameters takes into account NV_conditional_render by saving MI_PREDICATE_RESULT at the start of a draw call and restoring it at the end also the result of NV_conditional_render is taken into account when computing predicates that limit draw calls for ARB_indirect_parameters in a similar way to `1952fd8d` in ANV. v2: Optimize indirect draws (suggested by Kenneth Graunke) v3: (by Kenneth Graunke) - Fix an issue where indirect draws wouldn't set patch information before updating the compiled TCS. - Move some code back to iris_draw_vbo to avoid duplicating it. - Fix minor indentation issues. Signed-off-by: Illia Iorin <illia.iorin@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-11 23:56:52 -07:00
Kenneth Graunke	21a0be4a79	iris: Split iris_update_draw_info into two functions. Shader draw parameters need updating on each iteration of a multidraw loop, but the primitive based information only needs to be updated once. Also, patch information needs to be recorded before filling out the TCS program key, as it determines the number of HS instances.	2019-05-11 23:54:15 -07:00
Kenneth Graunke	72ccefb529	iris: Use full ways for L3 cache setup on Icelake. Anuj fixed this in i965 and anv, but the fix never landed in iris. Fixes tessellation corruption on Icelake. Thanks to Rafael for bisecting this and tracking it down. Fixes: `d0996d5fab` iris: Emit default L3 config for the render pipeline Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-10 16:50:14 -07:00
Kenneth Graunke	c61862ddfc	iris: Expose PIPE_CAP_DEVICE_RESET_STATUS_QUERY This provides a way for the application to query whether any resets have happened, which lets us expose "robust" contexts. This also enables the KHR_robust_buffer_access_behavior tests.	2019-05-09 16:49:07 -07:00
Kenneth Graunke	343f41781c	iris: Hook up device reset callbacks This mechanism lets the driver inform the state tracker about GPU resets, say for destroying a robust API context and reporting a "device lost" error to the application, making it take action to deal with this.	2019-05-09 16:49:07 -07:00
Kenneth Graunke	c5c12bdd00	iris: Try to recover from GPU hangs. The iris batch module now tries to detect that the kernel has banned our GEM context, creates a new non-banned context, and informs the iris context module that all assumptions about state are now invalid and it needs to reinitialize the relevant state. Based on Chris Wilson's work, but significantly rewritten by me.	2019-05-09 16:49:07 -07:00
Chris Wilson	7402564c07	iris: Add helpers to clone a hardware context. (Chris Wilson wrote this code in a patch titled "i965: Be resilient in the face of GPU hangs"; Ken fixed a bug and copied it to iris.)	2019-05-09 16:49:07 -07:00
Kenneth Graunke	c3701e9070	iris: Mark render batches as non-recoverable. Adapted from Chris Wilson's patch. The comment is largely his. Currently, when iris hangs the GPU, it will continue sending batches which incrementally update the state, assuming it's preserved across batches. However, the kernel's GPU reset support reinitializes the guilty context to the default GPU state (reasonably not wanting to trust the current state). This ends up resetting critical things like STATE_BASE_ADDRESS, causing memory accesses in all subsequent batches to be garbage, and almost certainly result in more hangs until we're banned or we kill the machine. We now ask the kernel to ban our render context immediately, so we notice we've gone off the rails as fast as possible. Eventually, we'll attempt to recover and continue. For now, we just avoid torching the GPU over and over.	2019-05-09 16:49:07 -07:00
Chris Wilson	8b81256469	iris: Reorganise execbuf to have a single point of failure Propagate the failure from GEM_EXECBUFFER2, cleanup then report failure if need be. We retain the current behaviour to abort() at the first sign of trouble -- for a non-robustness context, arguably this is the right thing to do as the client cannot recover, and the system state is lost. How to properly integrate with KHR_robustness and reset-strategy is left as a future exercise. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-08 17:21:07 -07:00
Kenneth Graunke	d9b9bb91ff	iris: Report the same video memory settings as i965. This just copy and pastes Ian's code from i965.	2019-05-08 12:43:08 -07:00
Kenneth Graunke	a232aa5c50	iris: Also handle res->offset for buffer sampler/image views	2019-05-07 13:36:18 -07:00
Mike Blumenkrantz	ddd716e746	iris: support dmabuf imports with offsets this adds support for imports where the image data begins at an offset from the start of the buffer, as used in h/x264 fixes kwg/mesa#47 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-07 13:36:08 -07:00
Kenneth Graunke	a032a9665f	iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS This makes CompressedTexSubImage from a PBO source do proper GPU rendering to upload instead of stalling to map the PBO source on the CPU (then copying it on the CPU). Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this functionality, and to Jason Ekstrand for writing the code I adapted. Vulkan only supports a single layer, however, and this code tries to support multiple layers as long as it's miplevel 0. Improves performance in Sid Meier's Civilization VI: Average frame time (ms): -3.67423% +/- 1.46201% (n=5) 99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)	2019-05-06 09:50:32 -07:00
Kenneth Graunke	694d1a08d3	iris: Delete bucketing allocators These add a lot of complexity, and I currently can't measure any performance benefit from having them. In the past, I seem to recall seeing a benefit in drawoverhead scores, but currently it looks like dropping them is either a wash or 1-2% faster. Drop them to simplify allocations.	2019-05-03 19:50:26 -07:00
Kenneth Graunke	bd4b18d255	iris: Force VMA alignment to be a multiple of the page size. This should happen regardless, but let's be paranoid.	2019-05-03 19:48:37 -07:00
Kenneth Graunke	068a700195	iris: leave the top 4Gb of the high heap VMA unused This ports commit `9e7b0988d6` from anv to iris. Thanks to Lionel for noticing that it was missing!	2019-05-03 19:48:37 -07:00
Kenneth Graunke	21062e21d9	iris: Fix 4GB memory zone heap sizes. The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages, and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB. So we can't use the top page.	2019-05-03 19:48:37 -07:00

1 2 3 4 5 ...

983 commits