fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 15:40:11 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	d4a4384b31	iris: Implement INTEL_DEBUG=pc for pipe control logging. This prints a log of every PIPE_CONTROL flush we emit, noting which bits were set, and also the reason for the flush. That way we can see which are caused by hardware workarounds, render-to-texture, buffer updates, and so on. It should make it easier to determine whether we're doing too many flushes and why.	2019-06-20 13:32:15 -05:00
Caio Marcelo de Oliveira Filho	f346b277d1	iris: Create binding table slot for num_work_groups only when needed Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-11 17:57:37 -07:00
Caio Marcelo de Oliveira Filho	045aeccf0e	iris: Always reserve binding table space for NIR constants Don't have a separate mechanism for NIR constants to be removed from the table. If unused, we will compact it away. The use_null_surface is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	97cd865be2	iris: Compact binding tables Change the iris_binding_table to keep track of what surfaces are actually going to be used, then assign binding table indices just for those. Reducing unused bytes on those are valuable because we use a reduced space for those tables in Iris. The rest of the driver can go from "group indices" (i.e. UBO #2) to BTI and vice-versa using helper functions. The value IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is not used or a certain BTI is not valid. The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be set to skip compacting binding table. v2: (all from Ken) Use BITFIELD64_MASK helper. Improve comments. Assert all group is marked as used when we have indirects. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	79f1529ae0	iris: Create an enum for the surface groups This will make convenient to handle compacting and printing the binding table. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	1c8ea8b300	iris: Handle binding table in the driver Stop using brw_compiler to lower the final binding table indices for surface access. This is done by simply not setting the 'prog_data->binding_table.*_start' fields. Then make the driver perform this lowering. This is a better place to perfom the binding table assignments, since the driver has more information and will also later consume those assignments to upload resources. This also prepares us for two changes: use ibc without having to implement binding table logic there; and remove unused entries from the binding table. Since the `block` field in brw_ubo_range now refers to the final binding table index, we need to adjust it before using to index shs->constbuf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Jason Ekstrand	e459d6d6df	iris: Enable nir_opt_large_constants Shader-db results on Kaby Lake: total instructions in shared programs: 15306230 -> 15304726 (<.01%) instructions in affected programs: 4570 -> 3066 (-32.91%) helped: 16 HURT: 0 total cycles in shared programs: 361703436 -> 361680041 (<.01%) cycles in affected programs: 129388 -> 105993 (-18.08%) helped: 16 HURT: 0 LOST: 0 GAINED: 2 The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal Space Program Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Jason Ekstrand	744f93f5c1	iris: Move upload_ubo_ssbo_surf_state to iris_program.c Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Kenneth Graunke	7d2b54e393	iris: Record state sizes for INTEL_DEBUG=bat decoding. Felix noticed a crash when using INTEL_DEBUG=bat decoding. It turned out that we were sometimes placing variable length data near the end of a buffer, and with the decoder guessing random lengths rather than having an actual count, it was walking off the end and crashing. So this does more than improve the decoder output. Unfortunately, this is a bit more complicated than i965's handling, because we don't have a single state buffer. Various places upload data via u_upload_mgr, and so there isn't a central place to record the size. We don't need to catch every single place, however, since it's only important to record variable length packets (like viewports and binding tables). State data also lives arbitrarily long, rather than being discarded on every batch like i965, so we don't know when to clear out old entries either. (We also don't have a callback when an upload buffer is released.) So, this tracking may space leak over time. That's probably okay though, as this is only a debugging feature and it's a slow leak. We may also get lucky and overwrite existing entries as we reuse BOs, though I find this unlikely to happen. The fact that the decoder works in terms of offsets from a state base address is also not ideal, as dynamic state base address and surface state base address differ for iris. However, because dynamic state addresses start from the top of a 4GB region, and binding tables start from addresses [0, 64K), it's highly unlikely that we'll get overlap. We can always improve this, but for now it's better than what we had.	2019-05-23 08:07:08 -07:00
Kenneth Graunke	646924cfa1	intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8 Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:30 -07:00
Illia Iorin	a35269cf44	iris: Implement ARB_indirect_parameters iris_draw_vbo is divided into two functions to remove unnecessary operations from the loop. This implementation of ARB_indirect_parameters takes into account NV_conditional_render by saving MI_PREDICATE_RESULT at the start of a draw call and restoring it at the end also the result of NV_conditional_render is taken into account when computing predicates that limit draw calls for ARB_indirect_parameters in a similar way to `1952fd8d` in ANV. v2: Optimize indirect draws (suggested by Kenneth Graunke) v3: (by Kenneth Graunke) - Fix an issue where indirect draws wouldn't set patch information before updating the compiled TCS. - Move some code back to iris_draw_vbo to avoid duplicating it. - Fix minor indentation issues. Signed-off-by: Illia Iorin <illia.iorin@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-11 23:56:52 -07:00
Kenneth Graunke	72ccefb529	iris: Use full ways for L3 cache setup on Icelake. Anuj fixed this in i965 and anv, but the fix never landed in iris. Fixes tessellation corruption on Icelake. Thanks to Rafael for bisecting this and tracking it down. Fixes: `d0996d5fab` iris: Emit default L3 config for the render pipeline Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-10 16:50:14 -07:00
Kenneth Graunke	a232aa5c50	iris: Also handle res->offset for buffer sampler/image views	2019-05-07 13:36:18 -07:00
Mike Blumenkrantz	ddd716e746	iris: support dmabuf imports with offsets this adds support for imports where the image data begins at an offset from the start of the buffer, as used in h/x264 fixes kwg/mesa#47 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-07 13:36:08 -07:00
Kenneth Graunke	a032a9665f	iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS This makes CompressedTexSubImage from a PBO source do proper GPU rendering to upload instead of stalling to map the PBO source on the CPU (then copying it on the CPU). Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this functionality, and to Jason Ekstrand for writing the code I adapted. Vulkan only supports a single layer, however, and this code tries to support multiple layers as long as it's miplevel 0. Improves performance in Sid Meier's Civilization VI: Average frame time (ms): -3.67423% +/- 1.46201% (n=5) 99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)	2019-05-06 09:50:32 -07:00
Kenneth Graunke	5ff5d0a895	iris: Disable dual source blending when shader doesn't handle it This is a port of Danylo's `eca4a6548d` which fixed the hang on i965. It fixes GPU hangs in his new Piglit test, arb_blend_func_extended-dual-src-blending-discard-without-src1. I avoided my own review feedback here, and decided to simply adjust 3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0]. It has never been clear to me which the hardware uses in every case. However, whacking the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang, and that packet is already dynamic, so it's easy to handle. I'd rather avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.	2019-05-02 21:14:49 -07:00
Rafael Antognolli	cf3cadacdf	iris: Update the surface state clear color address when available. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-30 08:31:44 -07:00
Kenneth Graunke	dcfca0af7c	iris: Set XY Clipping correctly. I was setting it based off a pipe_rasterizer_state field that appears to be entirely dead outside of the draw module respecting it. I should be setting it when the primitive type reaching the SF is neither points nor lines. This is, unfortunately, rather dirty, as we have to look at the rasterizer state, the geometry shader state, the tessellation evaluation shader state, and the primitive type...	2019-04-29 10:53:23 -07:00
Kenneth Graunke	6bd4cb920e	iris: Fix zeroing of transform feedback offsets in strange cases. Some of the dEQP.functional.transform_feedback tests end up doing the following sequence of operations: 1. BeginTransformFeedback 2. PauseTransformFeedback 3. Draw 4. ResumeTransformFeedback At step 1, we'd pack 3DSTATE_SO_BUFFER commands saying to zero the SO_WRITE_OFFSET registers. At step 2, we disable streamout, so step 3 doesn't bother emitting those commands. Then, step 4 re-packs new 3DSTATE_SO_BUFFER commands with offset = 0xFFFFFFFF, saying to continue appending at the existing offset. This loads the value from the BO as the offsets - but we never actually zeroed it. So, just maintain a flag saying "we actually emitted the commands", and stomp offset back to zero until we emit some.	2019-04-27 01:07:14 -07:00
Kenneth Graunke	529ace7887	iris: Silence unused function warning	2019-04-25 17:33:56 -07:00
Andrii Simiklit	4e9592c5fa	iris: make the TFB result visible to others OpenGL 4.6 Spec: "5.3.3 Rules ....... Note: “Updates” via rendering or transform feedback are treated consistently with updates via GL commands. Once EndTransformFeedback has been issued, any subsequent command in the same context that uses the results of the transform feedback operation will see the results." v2: removed a wrong comment ( Kenneth Graunke <kenneth@whitecape.org> ) v3: - flush+dirty depends on buffers usage history - removed an old hack ( Kenneth Graunke <kenneth@whitecape.org> ) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110404 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-25 11:48:04 -07:00
Kenneth Graunke	aa7306b4cf	iris: Some tidying for preemption support Just enable it during init_render_context on Gen10+, and move the Gen9 state tracking into iris_genx_state so it only exists on Gen9. Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-25 11:26:24 -07:00
Mike Blumenkrantz	7315882023	iris: add preemption support on gen9 this is basically just porting the following two commits to gallium: `d8b50e152a` `5c454661c6` resolves kwg/mesa#49 Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-24 14:47:08 -07:00
Mike Blumenkrantz	b53d256db8	iris: add support for INTEL_conservative_rasterization this hooks up the iris gallium driver to existing mesa bits which handle the implementation resolves kwg/mesa#8 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 16:36:30 -07:00
Kenneth Graunke	2208d5a683	iris: Fix DrawTransformFeedback math when there's a buffer offset We need to subtract the starting offset from the final offset before dividing by the stride. See src/intel/vulkan/genX_cmd_buffer.c:3142. Not known to fix anything.	2019-04-23 15:57:07 -07:00
Kenneth Graunke	77449d7c41	iris: Track valid data range and infer unsynchronized mappings. Applications frequently call glBufferSubData() to consecutive regions of a VBO to append new vertex data. If no data exists there yet, we can promote these to unsynchronized writes, even if the buffer is busy, since the GPU can't be doing anything useful with undefined content. This can avoid a bunch of unnecessary blitting on the GPU. u_threaded_context would do this for us, and in fact prohibits us from doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't hooked that up yet, and it may be useful to disable u_threaded_context when debugging...at which point we'd still want this optimization. At the very least, it would let us measure the benefit of threading independently from this optimization. And it's not a lot of code. Removes most stall avoidance blits in "Total War: WARHAMMER." On my Skylake GT4e at 1920x1080, this appears to improve performance in games by the following (but I did not do many runs for proper statistics gathering): ---------------------------------------------- \| DiRT Rally \| +2% (avg) \| + 2% (max) \| \| Bioshock Infinite \| +3% (avg) \| + 9% (max) \| \| Shadow of Mordor \| +7% (avg) \| +20% (max) \| ----------------------------------------------	2019-04-23 00:24:08 -07:00
Kenneth Graunke	5ad0c88dbe	iris: Replace buffer backing storage and rebind to update addresses. This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(), as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either of these happen, we swap out the backing storage of the buffer for a new idle BO, allowing us to write to it immediately without stalling or queueing a blit. On my Skylake GT4e at 1920x1080, this improves performance in games: ----------------------------------------------- \| DiRT Rally \| +25% (avg) \| +17% (max) \| \| Bioshock Infinite \| +22% (avg) \| +11% (max) \| \| Shadow of Mordor \| +27% (avg) \| +83% (max) \| -----------------------------------------------	2019-04-23 00:24:08 -07:00
Kenneth Graunke	b45dff1da8	iris: Rework image views to store pipe_image_view. This will be useful when rebinding images.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	2f60850a3f	iris: Rework UBOs and SSBOs to use pipe_shader_buffer This unifies a bunch of the UBO and SSBO code to use common structures. Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size, which can be useful when filling out the surface state.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	00d4019676	iris: Track bound constant buffers This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS) looking to see if any resources are bound.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	1566054459	iris: Track bound and writable SSBOs Marek recently extended pipe->set_shader_buffers() to take an extra writable_bitmask parameter, indicating which SSBOs are writable (some may be bound read-only). We can use this to decide whether to set EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us some cross-batch flushing if the SSBO is used for reading in both the render and compute engines.	2019-04-22 11:31:14 -07:00
Kenneth Graunke	36478b9f77	iris: Enable the dual_color_blend_by_location driconf option. This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.	2019-04-22 09:36:36 -07:00
Lionel Landwerlin	d1be67db39	iris: implement WaEnableStateCacheRedirectToCS This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-04-18 17:43:08 +01:00
Kenneth Graunke	8bf9b7b5b6	intel: Emit 3DSTATE_VF_STATISTICS dynamically Pipeline statistics queries should not count BLORP's rectangles. (23) How do operations like Clear, TexSubImage, etc. affect the results of the newly introduced queries? DISCUSSION: Implementations might require "helper" rendering commands be issued to implement certain operations like Clear, TexSubImage, etc. RESOLVED: They don't. Only application submitted rendering commands should have an effect on the results of the queries. Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when the driver is hacked to always perform glBufferData via a GPU staging copy (for debugging purposes). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-14 19:58:04 -07:00
Kenneth Graunke	4fcb749044	iris: Actually pin the scratch BO. We were pinning it for compute shaders, and pinning it when restoring saved buffers, but we never actually pinned it in the original batch for VS/TCS/TES/GS/FS. Fixes rendering in GFXBench5's Tessellation demo and a bunch of Piglit geometry shader tests.	2019-04-11 15:03:27 -07:00
Marek Olšák	66a82ec6f0	gallium: add writable_bitmask parameter into set_shader_buffers to indicate write usage per buffer. This is just a hint (it will be used by radeonsi). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-04 19:28:52 -04:00
Danylo Piliaiev	b19494c54e	iris: Fix assert when using vertex attrib without buffer binding The GL 4.5 spec says: "If any enabled array’s buffer binding is zero when DrawArrays or one of the other drawing commands defined in section 10.4 is called, the result is undefined." The result is undefined but it should not crash. Fixes: gl-3.1-vao-broken-attrib Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-04 22:57:24 +00:00
Tapani Pälli	41f76dd513	iris: move variable to the scope where it is being used iris_upload_border_color is passed a pointer which points to variable that is introduced in a different scope. CID: 1444296 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-04 04:43:20 +00:00
Rafael Antognolli	7339660e80	iris: Add aux.sampler_usages. We want to skip some types of aux usages (for instance, ISL_AUX_USAGE_HIZ when the hardware doesn't support it, or when we have multisampling) when sampling from the surface. Instead of checking for those cases while filling the surface state and leaving it blank, let's have a version of aux.possible_usages for sampling. This way we can also avoid allocating surface state for the cases we don't use. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:45 -07:00
Rafael Antognolli	dfc5620a41	iris: Do not allocate clear_color_bo for gen8. Since we are not using it for the clear color, there's no need to allocate it. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:41 -07:00
Rafael Antognolli	2660667284	iris/gen8: Re-emit the SURFACE_STATE if the clear color changed. The swizzle for rendering surfaces is always identity. So when we are doing the fast clear, we don't have enough information to store the clear color OR'ed with the Shader Channel Select bits for the dword in the SURFACE_STATE. Instead of trying to patch up the SURFACE_STATE correctly later, by reading the color from the clear color state buffer and then doing all the operations to store it, let's just re-emit the whole SURFACE_STATE. That should make things way simpler on gen8, and we can still use the clear color state buffer for gen9+. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:33 -07:00
Rafael Antognolli	6a02873687	iris: Only update clear color for gens 8 and 9. Newer gens can read it directly. Also properly skip updating the ISL_AUX_USAGE_NONE surface. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:24:15 -07:00
Rob Clark	b2d651b862	iris: fix set_sampler_view Update to match docs. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-30 13:05:31 -04:00
Anuj Phogat	9c421d6b47	iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 19:59:59 +00:00
Anuj Phogat	e0f4359ec1	iris/icl: Set Enabled Texel Offset Precision Fix bit h/w specification requires this bit to be always set. See Mesa commit `5eb173304b`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 19:59:59 +00:00
Danylo Piliaiev	c8abe03f3b	i965,iris,anv: Make alpha to coverage work with sample mask From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) \| 0x0808 * (m & 2) \| 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <currojerez@riseup.net> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-03-25 13:54:55 -07:00
Chris Wilson	db99d02fce	iris: Push heavy memchecker code to DEBUG Invoking VALGRIND_CHECK_MEM_IS_DEFINED pulls in enough code to convince gcc to not inline __gen_uint and results in a lot of packing code ending up out-of-line with lots of stack copying. To ameliorate this, only insert the check inside the packer if DEBUG is defined and instead perform the validation checking before submitting the batch to the kernel. This should give accurate results if --trace-origins=yes is used, and failing that we can recompile in full debug mode to check on insertion. Improve drawoverhead baseline by 25% with a default build with valgrind-dev installed (with effectively no loss of vg coverage). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-22 10:38:03 -07:00
Kenneth Graunke	66c100a8d6	iris: Skip resolves and flushes altogether if unnecessary Improves drawoverhead baseline scores by 1.17x.	2019-03-21 20:28:17 -07:00
Rafael Antognolli	93123417dd	iris: Track fast clear color. v2: Update tracked clear color when we update the surface state. v3: Update all aux surface states when updating the clear color. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:26 -07:00
Rafael Antognolli	131b42f0aa	iris: Implement fast clear color. If all the restrictions are satisfied, do a fast clear instead of regular clear. v2: - add perf_debug() when we can't fast clear (Ken) - improve comment: s/miptree/resource/ (Ken) - use swizzle_color_value from blorp (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00

... 2 3 4 5 6 ...

593 commits