fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 20:00:10 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	face221283	iris: Properly unreference extra VBOs for draw parameters bound_vertex_buffers doesn't include extra draw parameters buffers. Tracking this correctly is kind of complicated, and iris_destroy_state isn't exactly in a hot path, so just loop over all VBO bindings. Fixes: `4122665dd9` (iris: Enable ARB_shader_draw_parameters support) Reported-by: Sergii Romantsov <sergii.romantsov@globallogic.com>	2019-10-08 11:14:21 -07:00
Marek Olšák	732ea0b213	gallium: add PIPE_RESOURCE_FLAG_SINGLE_THREAD_USE to skip util_range lock u_upload_mgr sets it, so that util_range_add can skip the lock. The time spent in tc_transfer_flush_region decreases from 0.8% to 0.2% in torcs on radeonsi. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-07 20:05:00 -04:00
Kenneth Graunke	6d9c1f30e4	iris: Drop vtbl usage for some load_register calls We can just call the actual functions directly.	2019-10-07 14:10:33 -07:00
Jordan Justen	ae9c311b9a	iris/state: Move reg/mem load/store functions earlier in file Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-10-07 14:10:33 -07:00
Kenneth Graunke	90a35752b4	iris: Drop bonus parameters from iris_init_*_context() Nothing uses vtbl or dbg, and screen is available from the batch.	2019-10-07 13:15:56 -07:00
Kenneth Graunke	bd46dfa889	Revert "iris: Hack up a SKL/Gen9LP PS push constant fifo depth workaround" This reverts commit `4f857423b3`. It caused GPU hangs on all affected platforms, in e.g. Piglit bin/stencil-twoside -auto -fbo.	2019-10-07 09:08:41 -07:00
Kenneth Graunke	4f857423b3	iris: Hack up a SKL/Gen9LP PS push constant fifo depth workaround This is a port of Nanley's `904c2a617d` from i965 to iris. One concern is that iris uses larger batches, and also emits far fewer commands, so we may come closer to the 500 limit within a batch, and could need to supplement this with actual counting. Manhattan 3.0 had 239 3DSTATE_CONSTANT_PS packets in a batch, Unigine Valley had 155. So it seems like we're still in the realm of safety.	2019-10-05 17:18:45 -04:00
Kenneth Graunke	f1bba22f69	iris: Refactor push constant allocation so we can reuse it We'll need this for a workaround shortly. While refactoring, also improve the comment slightly.	2019-10-05 17:18:44 -04:00
Kenneth Graunke	309924c3c9	iris: Fix iris_rebind_buffer() for VBOs with non-zero offsets. We can't just check for the BO base address, we need to check for the full address including any offset we may have applied. When updating the address, we need to include the offset again. Fixes: `5ad0c88dbe` ("iris: Replace buffer backing storage and rebind to update addresses.")	2019-09-30 12:41:03 -07:00
Kenneth Graunke	50c0dd8621	Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM" This reverts commit `729de1488f`. It turns out that, although the register is in the logical context, it isn't whitelisted, so we can't actually write it from userspace batch buffers. The write just becomes a noop, which is why we saw no performance changes. I manually whitelisted it, and still observed no performance gains, but it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments on the iris driver. So we might need to fix something before enabling this. To prevent it randomly getting turned on should the kernel ever whitelist this register, we revert the patch for now.	2019-09-23 16:31:23 -07:00
Kenneth Graunke	a16975e615	iris: Rework iris_update_draw_parameters to be more efficient This improves a couple of things: 1. We now only update anything if the shader actually cares. Previously, is_indexed_draw was causing us to flag dirty vertex buffers, elements, and SGVs every time the shader switched between indexed and non-indexed draws. This is a very common situation, but we only need that information if the shader uses gl_BaseVertex. We were also flagging things when switching between indirect/direct draws as well, and now we only bother if it matters. 2. We upload new draw parameters only when necessary. When we detect that the draw parameters have changed, we upload a new copy, and use that. Previously we were uploading it every time the vertex buffers were dirty (for possibly unrelated reasons) and the shader needed that info. Tying these together also makes the code a bit easier to follow. In Civilization VI's benchmark, this code was flagging dirty state many times per frame (49 average, 16 median, 614 maximum). Now it occurs exactly once for the entire run.	2019-09-18 22:50:52 -07:00
Kenneth Graunke	6841f11d14	iris: Use state_refs for draw parameters. iris_state_ref is a <resource, offset> tuple, which is exactly what we need here.	2019-09-18 22:50:52 -07:00
Kenneth Graunke	3da8a8a3d6	iris: Avoid uploading SURFACE_STATE descriptors for UBOs if possible If we can entirely push uniform data, we don't need a SURFACE_STATE descriptor for pulling data. Since constant uploads are a very common operation, and being able to push all data is also very common, we would like to avoid the overhead in this case. This patch defers uploading new descriptors. Instead of handling that at iris_set_constant_buffer, we do it at iris_update_compiled_shaders, where we can see the currently bound shader variants. If any need pull descriptors, and descriptors are missing, we update them and flag that the binding table also needs to be refreshed. Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by 31.9774% +/- 1.12947% (n=15). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	dd83ef0d1a	iris: Track per-stage bind history, reduce work accordingly We now track per-stage bind history for constant and shader buffers, shader images, and sampler views by adding an extra res->bind_stages field to go with res->bind_history. This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages involved, and also skip some CPU overhead in iris_rebind_buffer. Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace on Icelake. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	e7db3577f8	iris: Explicitly emit 3DSTATE_BTP_XS on Gen9 with DIRTY_CONSTANTS_XS Right now, we usually flag both IRIS_DIRTY_{CONSTANTS,BINDINGS}_XS, because we have SURFACE_STATE for constant buffers in case the shaders access them via pull mode. But this flagging is overkill in many cases. Gen8 and Gen11 don't need it at all. Gen9 doesn't need that large of a hammer in all cases. Just handle it explicitly so the right thing happens. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	caa0aebd01	iris: Flag IRIS_DIRTY_BINDINGS_XS on constant buffer rebinds We upload a new SURFACE_STATE for the UBO/SSBO in question, which means that we need new binding tables as well. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	f8c44e4ed7	iris: Skip allocating a null surface when there are 0 color regions. The compiler now sets the "Null Render Target" bit in the RT write extended message descriptor, causing it to write to an implicit null surface without us needing to set one up in the binding table. Together with the last patch, this improves performance in Car Chase on an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 14:27:51 -07:00
Kenneth Graunke	c9fb704f72	iris: Initialize ice->state.prim_mode to an invalid value It was calloc'd to 0 which is PIPE_PRIM_POINTS, which means that we fail to notice an initial primitive of points being new, and fail at updating the "primitive is points or lines" field. We do not need to reset this on device loss because we're tracking the last primitive mode sent to us on the CPU via draw_vbo, not the last primitive mode sent to the GPU. Fixes several tests: - dEQP-GLES3.functional.clipping.point.wide_point_clip - dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center - dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner Fixes: `dcfca0af7c` ("iris: Set XY Clipping correctly.")	2019-09-13 16:31:29 -07:00
Anuj Phogat	729de1488f	intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM Initial benchmarking didn't show any performance benefits. But it might eventually. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-09-11 11:29:37 -07:00
Kenneth Graunke	077a1952cc	iris: Fix constant buffer sizes for non-UBOs Since the system value refactor, we've accidentally only been setting cbuf->buffer_size in the UBO case, and not in the uploaded-constants case. We use cbuf->buffer_size to fill out the SURFACE_STATE entry, so it needs to be initialized in both cases. Fixes: `3b6d787e40` ("iris: move sysvals to their own constant buffer")	2019-09-10 10:53:15 -07:00
Kenneth Graunke	7d28e9ddd6	iris: Optimize out redundant sampler state binds This cuts roughly 85% of the 3DSTATE_SAMPLER_STATE_POINTERS_PS calls in the J2DBench images test. For some reason, the state tracker is calling bind_sampler_state with the same sampler state in a bunch of cases.	2019-09-09 11:55:27 -07:00
Kenneth Graunke	9173459b95	iris: Ignore line stipple information if it's disabled The line stipple pattern and factor only matter if line stippling is actually enabled. Otherwise, we can safely ignore it. PBO upload may give us zero for line stipple information, while normal drawing tends to give us an actual stipple pattern such as 0xffff. This was causing us to flag IRIS_DIRTY_LINE_STIPPLE way too often, leading to useless 3DSTATE_LINE_STIPPLE commands, which are non-pipelined and thus very expensive. Improves performance in Manhattan 3.0 on Skylake GT4e by 0.149261% +/- 0.0380796% (n=210). On an Icelake 8x8 with the GPU frequency locked at 700Mhz, improves by 0.423756% +/- 0.222843% (n=3).	2019-09-09 10:55:20 -07:00
Jordan Justen	9790cfcefa	anv,iris: L3ALLOC register replaces L3CNTLREG for gen12 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-09-06 13:11:25 -07:00
Kenneth Graunke	0d0ae16e8f	intel: Stop redirecting state cache to command streamer cache section This bit redirects the state cache from the unified/RO sections of the L3 cache to the "CS command buffer" section of the cache, which would be set up via TCCNTLREG. The documentation says: "Additionaly, this redirection should be enabled only if there is a non-zero allocation for the CS command buffer section." We don't allocate any cache to the CS command buffer section, so enabling this redirection effectively disabled the state cache. The Windows driver only sets up that section when using POSH, which we do not currently use. So, leave it unallocated and disable the redirection to get a functional state cache again. Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%, and Car Chase by 2%.	2019-09-06 10:57:55 -07:00
Kenneth Graunke	68be5ff8d0	iris: Invalidate state/texture/constant caches after STATE_BASE_ADDRESS Jason pointed out that the caches likely refer to offsets from dynamic and surface state base addresses, so when we change those, we need to invalidate the caches. Comment borrowed from src/intel/vulkan/genX_cmd_buffer.c.	2019-09-06 10:57:55 -07:00
Caio Marcelo de Oliveira Filho	63f0259aeb	iris: Guard GEN9-only function in Iris state to avoid warning Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-23 13:25:27 -07:00
Kenneth Graunke	9310ae6f68	iris: Set MOCS in all STATE_BASE_ADDRESS commands Rafael Antognolli tracked down a performance gap between i965 and iris in Synmark2's OglCSDof microbenchmark, noting that iris was performing substantially more memory reads and writes, with substantially fewer L3 hits. He suggested that something might be wrong with MOCS, or L3 configs, at which point I came up with a theory... It would appear that the STATE_BASE_ADDRESS command updates the MOCS settings for various base addresses even if you don't specify the "Modify Enable" bit for that address. Until now, we had been setting only the MOCS for bases we intended to change, leaving the others "blank" which is MOCS table entry 0, which is uncached. Most data access has a more specific MOCS (e.g. in SURFACE_STATE), but scratch access uses the Stateless Data Port Access MOCS from STATE_BASE_ADDRESS. So this meant all scratch access was uncached. Improves performance in Synmark2's OglCSDof by 2x, bringing iris on par with the existing i965 driver. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-23 10:21:48 -07:00
Kenneth Graunke	1cd13ccee7	iris: Update fast clear colors on Gen9 with direct immediate writes. Gen11 stores the fast clear color in an "indirect clear buffer", as a packed pixel value. Gen9 hardware stores it as a float or integer value, which is interpreted via the format. We were trying to store that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM it from there to the actual SURFACE_STATE bytes where it's stored. This unfortunately doesn't work for blorp_copy(), which does bit-for-bit copies, and overrides the format to a CCS-compatible UINT format. This causes the clear color to be interpreted in the overridden format. Normally, we provide the clear color on the CPU, and blorp_blit.c:2611 converts it to a packed pixel value in the original format, then unpacks it in the overridden format, so the clear color we use expands to the bits we originally desired. However, BLORP doesn't support this pack/unpack with an indirect clear buffer, as it would need to do the math on the GPU. On Gen11+, it isn't necessary, as the hardware does the right thing. This patch changes Gen9 to stop using an indirect clear buffer and simply do PIPE_CONTROLs with post-sync write immediate operations to store the new color over the surface states for regular drawing. BLORP continues streaming out surface states, and handles fast clear colors on the CPU. Fixes: `53c484ba8a` ("iris: blorp using resolve hooks") Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-08-22 18:31:14 -07:00
Kenneth Graunke	117a0368b0	iris: Fix broken aux.possible/sampler_usages bitmask handling For renderable surfaces, we allocate SURFACE_STATEs for each bit in res->aux.possible_usages. Sampler views use res->aux.sampler_usages. When pinning buffers, we call surf_state_offset_for_aux() to calculate the offset to the desired surface state. surf_state_offset_for_aux() took an aux_modes parameter, which should be one of those two fields. However...it was not using that parameter. It always used the broader res->aux.possible_usages field directly. One of the callers, update_clear_value(), was passing incorrect masks for this parameter. It iterated through the bits in order, using u_bit_scan(), which destructively modifies the mask. So each time we called it, the count of bits before our selected mode was 0, which would cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE, rather than updating each in turn. This was hidden by the earlier bug where surf_state_offset_for_aux() ignored the parameter. Fixes: `7339660e80` ("iris: Add aux.sampler_usages.") Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-08-22 18:31:14 -07:00
Kenneth Graunke	f6c44549ee	iris: Replace devinfo->gen with GEN_GEN This is genxml, we can compile out this code. Fixes: `2660667284` ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.") Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-08-22 18:31:14 -07:00
Sagar Ghuge	fe0e9db797	iris: Enable non coherent framebuffer fetch on broadwell v2: Use GEN_GEN in iris_state (Kenneth Graunke) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:58 -07:00
Sagar Ghuge	57ce422e20	iris: Free resource if failed to allocate surface state Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:55 -07:00
Sagar Ghuge	02244bc515	iris: Pass isl_surf to fill_surface_state Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Suggested-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:45 -07:00
Sagar Ghuge	638a157e02	iris: Add infrastructure to support non coherent framebuffer fetch Create separate SURFACE_STATE for render target read in order to support non coherent framebuffer fetch on broadwell. Also we need to resolve framebuffer in order to support CCS_D. v2: Add outputs_read check (Kenneth Graunke) v3: 1) Import Curro's comment from get_isl_surf 2) Rename get_isl_surf method 3) Clean up allocation in case of failure Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:44 -07:00
Jason Ekstrand	16edd02bfa	iris: Only request an input mask if the shader needs it Fixes: `aebca3961b` "iris: Fix handling of SIMD32 fragment shaders" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-16 19:59:42 -05:00
Jordan Justen	246eebba4a	iris: Export and import surfaces with modifiers that have aux data The DRI interface for modifiers with aux data treats the aux data as a separate plane of the main surface. When the dri layer requests the plane associated with the aux data, we save the required information into the dri aux plane image. Later when the image is used, the dri plane image will be available in the pipe_resource structure's `next` field. Therefore in iris, we reconstruct the aux setup from this separate dri plane image when the image is used. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 15:20:47 -07:00
Jordan Justen	aad36dfd16	iris: Add aux offset into hiz_address This is not currently required because the hiz buffer is in a separate buffer, and therefore the offset is 0. If we combine the aux buffer with the main surface buffer, then the hiz offset may become non-zero. Suggested-by: Nanley Chery <nanley.g.chery@intel.com> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 15:20:39 -07:00
Rafael Antognolli	a1a499e7fe	iris/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced. If the pixel pipes have a different number of subslices, emit a slice hashing table that will ensure proper workload distribution. v2: Don't need to set the mask - it's mbo (Ken). v3: Don't keep a reference to the resource used for emitting the table (Ken).	2019-08-12 16:19:08 -07:00
Francisco Jerez	026773397b	iris/gen9: Optimize slice and subslice load balancing behavior. See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 13:17:58 -07:00
Tapani Pälli	d4b574f26a	iris: reorder arguments as expected by the function CID: 1452262 Fixes: `b4c54894bb` "iris: Handle vertex shader with window space position" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>	2019-08-12 13:08:26 +03:00
Mark Janes	0fd4359733	iris/perf: implement routines to return counter info With this commit, Iris will report that AMD_performance_monitor is supported, and will allow the caller to query the available metrics. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-09 19:28:03 -07:00
Danylo Piliaiev	b4c54894bb	iris: Handle vertex shader with window space position Iris advertises support for PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION so let's actually implement it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110657 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-06 20:25:35 +00:00
Jason Ekstrand	aebca3961b	iris: Fix handling of SIMD32 fragment shaders The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out which bits to pull out of the prog data and stuff where. Therefore, they need to be called with the final set of _NPixelDispatchEnable bits after we've done the workaround for SIMD32 and 16x MSAA. Otherwise, if you end up with a somewhat odd combination of enables, the GRF start reg and KSP data ends up in the wrong slots. In particular, running SIMD32-only is broken but several other combinations are as well. Fixes: `5445c176e2` "iris: Disable SIMD32 when using a 16x MSAA..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-03 22:24:40 +00:00
Timothy Arceri	2afedfaf9a	iris: add support for gl_ClipVertex in tess eval shaders Required for OpenGL compat support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:37 -07:00
Timothy Arceri	00b5bf2d72	iris: add support for gl_ClipVertex in geometry shaders This will enable us to support the OpenGL compat profile. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:27 -07:00
Kenneth Graunke	b61f17d362	iris: Skip emitting 3DSTATE_INDEX_BUFFER if possible We were emitting 3DSTATE_INDEX_BUFFER on every indexed draw, even if back-to-back draws referred to the same index buffer. This improves drawoverhead scores in the DrawElements cases by about 10%, by giving us even more minimal batches.	2019-07-31 15:14:10 -07:00
Kenneth Graunke	3a22a8bf49	iris: Skip repeated depth buffer disables. Often times, the depth buffer is entirely disabled, but color render targets change. For example, GenerateMipmaps will change the color render target for each miplevel, but there is no depth buffer. In the Civilization VI benchmark, this drops the median number of 3DSTATE_DEPTH_BUFFER etc. packets emitted per frame from 472 to 34.	2019-07-30 19:47:41 -07:00
Kenneth Graunke	44e713eddb	iris: Fix SO offset to be 32-bit in DrawTransformFeedback handling We accidentally started copying a full 64-bit value rather than copying a 32-bit offset and zeroing the top 32-bits. This caused us to compute bogus vertex counts which could lead to GPU hangs in some cases. Thanks to Clayton Craft for catching the regressions! Fixes: `0e24d10ff5` ("iris: Use gen_mi_builder to handle CS ALU operations.")	2019-07-29 16:38:19 -07:00
Kenneth Graunke	0e24d10ff5	iris: Use gen_mi_builder to handle CS ALU operations. In a few cases, we switch to MI_MATH instead of MI_PREDICATE, just because we were already doing math and it's easier to chain together. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	fe7ed6b057	iris: Make iris_query.c a genxml-compiled file. This will let us use Jason's new MI-builder shortly. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00

... 2 3 4 5 6 ...

666 commits