fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-20 22:30:12 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	0b7ecfdda5	iris: Implement the Broadwell NP Z PMA Stall Fix This should help avoid stalls in the pixel mask array in certain non-promoted depth cases. It especially helps for Z16, as each bit in the PMA corresponds to two pixels when using Z16, as opposed to the usual one pixel. Improves performance in GFXBench5 TRex by 22% (n=1).	2019-10-08 21:53:12 -07:00
Kenneth Graunke	b9e93db208	intel: Increase Gen11 compute shader scratch IDs to 64. From the MEDIA_VFE_STATE docs: "Starting with this configuration, the Maximum Number of Threads must be set to (#EU * 8) for GPGPU dispatches. Although there are only 7 threads per EU in the configuration, the FFTID is calculated as if there are 8 threads per EU, which in turn requires a larger amount of Scratch Space to be allocated by the driver." It's pretty clear that we need to increase this for scratch address calculations, because the FFTID has a certain bit-pattern. The quote above seems to indicate that we should increase the actual thread count programmed in MEDIA_VFE_STATE as well, but we think the intention is to only bump the scratch space. Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8. Fixes: `5ac804bd9a` ("intel: Add a preliminary device for Ice Lake") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-09-23 16:59:40 -07:00
Kenneth Graunke	3da8a8a3d6	iris: Avoid uploading SURFACE_STATE descriptors for UBOs if possible If we can entirely push uniform data, we don't need a SURFACE_STATE descriptor for pulling data. Since constant uploads are a very common operation, and being able to push all data is also very common, we would like to avoid the overhead in this case. This patch defers uploading new descriptors. Instead of handling that at iris_set_constant_buffer, we do it at iris_update_compiled_shaders, where we can see the currently bound shader variants. If any need pull descriptors, and descriptors are missing, we update them and flag that the binding table also needs to be refreshed. Improves performance in GFXBench5 gl_driver2 on an i7-6770HQ by 31.9774% +/- 1.12947% (n=15). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	f8c44e4ed7	iris: Skip allocating a null surface when there are 0 color regions. The compiler now sets the "Null Render Target" bit in the RT write extended message descriptor, causing it to write to an implicit null surface without us needing to set one up in the binding table. Together with the last patch, this improves performance in Car Chase on an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 14:27:51 -07:00
Kenneth Graunke	6a82a374b4	iris: trivial whitespace fixes	2019-09-11 21:33:41 -07:00
Connor Abbott	dcc64fcfed	nir: Fix num_ssbos when lowering atomic counters Otherwise it's impossible to know the maximum SSBO index for both internal TGSI shaders from TTN (which don't have any notion of atomic counters and no offset) as well as shaders from GLSL. I fixed everything I could find while grepping for num_ssbos and num_abos, which hopefully is everything (iris was the only user I could find that uses it in a meaningful way). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-03 15:54:54 +02:00
Jason Ekstrand	f58e0405b6	intel/fs: Drop the gl_program from fs_visitor It's not used by anything anymore now that so much lowering has been moved into NIR. Sadly, we still need on in brw_compile_gs() for geometry shaders on Sandy Bridge. Short of a lot of pointless work, that one's probably not going away. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-25 01:02:52 -05:00
Jason Ekstrand	951cf94521	nir: Add explicit signs to image min/max intrinsics This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-21 17:19:55 +00:00
Sagar Ghuge	fe0e9db797	iris: Enable non coherent framebuffer fetch on broadwell v2: Use GEN_GEN in iris_state (Kenneth Graunke) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:58 -07:00
Sagar Ghuge	58471e20d2	iris: Add render target read entry in binding table This will be used in next patches for supporting non coherent framebuffer fetch on Broadwell. v2: Fix comment (Kenneth Graunke) v3: 1) Fix a few nits (Caio) 2) Add comment (Caio) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:31 -07:00
Jason Ekstrand	134607760a	intel/compiler: Fill a compiler statistics struct This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Rhys Perry	c52c54a746	anv,i965,iris: deduplicate setting of total_shared v5: add patch Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-08 12:10:39 -05:00
Danylo Piliaiev	b4c54894bb	iris: Handle vertex shader with window space position Iris advertises support for PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION so let's actually implement it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110657 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-06 20:25:35 +00:00
Timothy Arceri	2afedfaf9a	iris: add support for gl_ClipVertex in tess eval shaders Required for OpenGL compat support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:37 -07:00
Timothy Arceri	00b5bf2d72	iris: add support for gl_ClipVertex in geometry shaders This will enable us to support the OpenGL compat profile. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:27 -07:00
Jason Ekstrand	c84b8eeeac	intel/compiler: Be more conservative about subgroup sizes in GL The rules for gl_SubgroupSize in Vulkan require that it be a constant that can be queried through the API. However, all GL requires is that it's a uniform. Instead of always claiming that the subgroup size in the shader is 32 in GL like we have to do for Vulkan, claim 8 for geometry stages, the maximum for fragment shaders, and the actual size for compute. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Timothy Arceri	80c2c17e1e	iris: change last_vue_stage() to look at uncompiled shaders This allows us to find the last vue stage before we have compiled the shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Kenneth Graunke	a01770b9c8	iris: Fix key->input_vertices for 8_PATCH TCS mode. We were failing to flag the program dirty when it changed. Also, we were unnecessarily setting key->input_vertices for SINGLE_PATCH mode, which would reduce program cache hits. Only set it if needed.	2019-07-11 01:18:24 -07:00
Kenneth Graunke	c58f52f0ef	iris: Only set key->flat_shade if COL0/COL1 are written. This was just laziness on my part, we already added similar checks in the VS key handling. Just need to do it here too. Should improve cache hits.	2019-07-11 00:12:50 -07:00
Kenneth Graunke	cb82d534a0	iris: Drop comment about var->data.binding not being set. I refactored the sampler lowering passes a long time ago to ensure that gl_nir_lower_samplers_as_deref is run and var->data.binding is set.	2019-07-11 00:12:00 -07:00
Kenneth Graunke	38f9954208	iris: Drop comments about missing NOS These stages don't need NOS. If they do, we can add it - the infrastructure is there if we need it someday.	2019-07-11 00:12:00 -07:00
Jason Ekstrand	14781e2122	intel/compiler: Add a "base class" for program keys Right now, all keys have two things in common: a program string ID and a sampler_prog_key_data. I'd like to add another thing or two and need a place to put it. This commit adds a new brw_base_prog_key struct which contains those two common bits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-10 19:35:55 +00:00
Kenneth Graunke	10560f8506	iris: Minor tidying	2019-07-03 22:24:44 -07:00
Timur Kristóf	3b6d787e40	iris: move sysvals to their own constant buffer This commit moves the sysvals to a separate, new constant buffer at the end (before the shader constants). It also allows us to remove the special handling we had for cbuf0, and enables all constant buffers to support user-specified resources and user buffers. v2: (by Kenneth Graunke) - Rebase on the previous patch to fix system value uploading. - Fix disk cache num_cbufs calculation - Fix passthrough TCS to report num_cbufs = 1 so upload actually occurs - Change upload_sysvals to assert that num_cbufs > 0 when num_system_values > 0. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-23 18:33:23 +02:00
Kenneth Graunke	ebc8c20b3e	iris: Mark cbuf0 as not needing uploading every single time I neglected to mark cbuf0_needs_upload = false after uploading it. The obvious fix regressed user clip plane tests, because of a second bug: we also forgot to mark that they may need re-uploading when changing shader programs (which may have more or less system values). Thanks to Timur Kristóf for catching the original issue. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>	2019-06-23 18:32:11 +02:00
Caio Marcelo de Oliveira Filho	f346b277d1	iris: Create binding table slot for num_work_groups only when needed Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-11 17:57:37 -07:00
Kenneth Graunke	30314270d4	iris: Zero shs->cbuf0 when binding a passthrough TCS Fixes valgrind errors when running two CTS tests back to back: - KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT* (The first test has an actual TCS, the second uses passthrough.)	2019-06-07 15:13:42 -07:00
Kenneth Graunke	cd796120c9	iris: Rename bind_state to bind_shader_state. bind_state is possibly the worst name ever. For create, we used create_shader_state, which is more descriptive. Put shader in the name.	2019-06-07 11:26:20 -07:00
Kenneth Graunke	22025595f3	iris: Sweep the NIR in iris_create_uncompiled_shader(). We run a ton of backend specific passes here (mostly brw_preprocess_nir) and ought to sweep up any unused memory at this point, since we're going to hang on to this NIR for as long as the linked program lives.	2019-06-07 01:29:38 -07:00
Jason Ekstrand	bb67a99a2d	intel/nir: Stop returning the shader from helpers Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-05 20:07:28 +00:00
Kenneth Graunke	34d3103dee	iris: Fix SO stride units for DrawTransformFeedback Mesa measures in DWords. The hardware also claims to measure in DWords. Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ. Which means that it really measures in bytes. So, convert to bytes. Without this, our offset / stride denominator was 1/4th the size it should be, leading to 4x the vertex count that we should have had. Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers	2019-06-03 22:51:18 -07:00
Caio Marcelo de Oliveira Filho	045aeccf0e	iris: Always reserve binding table space for NIR constants Don't have a separate mechanism for NIR constants to be removed from the table. If unused, we will compact it away. The use_null_surface is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	5611444809	iris: Print binding tables when INTEL_DEBUG=bt Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	97cd865be2	iris: Compact binding tables Change the iris_binding_table to keep track of what surfaces are actually going to be used, then assign binding table indices just for those. Reducing unused bytes on those are valuable because we use a reduced space for those tables in Iris. The rest of the driver can go from "group indices" (i.e. UBO #2) to BTI and vice-versa using helper functions. The value IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is not used or a certain BTI is not valid. The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be set to skip compacting binding table. v2: (all from Ken) Use BITFIELD64_MASK helper. Improve comments. Assert all group is marked as used when we have indirects. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	79f1529ae0	iris: Create an enum for the surface groups This will make convenient to handle compacting and printing the binding table. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	1c8ea8b300	iris: Handle binding table in the driver Stop using brw_compiler to lower the final binding table indices for surface access. This is done by simply not setting the 'prog_data->binding_table.*_start' fields. Then make the driver perform this lowering. This is a better place to perfom the binding table assignments, since the driver has more information and will also later consume those assignments to upload resources. This also prepares us for two changes: use ibc without having to implement binding table logic there; and remove unused entries from the binding table. Since the `block` field in brw_ubo_range now refers to the final binding table index, we need to adjust it before using to index shs->constbuf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	518f83236b	iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms We'll change iris to perform lowering of the binding table indices earlier (before the backend kick in), but the backend compiler uses the result of the analysis to identify load_ubo intrinsics, so we do the analysis after the lowering to have the right indices. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Jason Ekstrand	e459d6d6df	iris: Enable nir_opt_large_constants Shader-db results on Kaby Lake: total instructions in shared programs: 15306230 -> 15304726 (<.01%) instructions in affected programs: 4570 -> 3066 (-32.91%) helped: 16 HURT: 0 total cycles in shared programs: 361703436 -> 361680041 (<.01%) cycles in affected programs: 129388 -> 105993 (-18.08%) helped: 16 HURT: 0 LOST: 0 GAINED: 2 The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal Space Program Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Jason Ekstrand	9dc57eebd5	iris: Don't assume UBO indices are constant It will be true for the constant/system value buffer because they use a constant zero but it's not true in general. If we ever got here when the source wasn't constant, nir_src_as_uint would assert. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2019-05-29 21:09:16 +00:00
Jason Ekstrand	744f93f5c1	iris: Move upload_ubo_ssbo_surf_state to iris_program.c Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Kenneth Graunke	6892d2b94a	iris: Clone before calling nir_strip and serializing This is non-destructive and leaves the debugging information in place. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-29 18:16:32 +00:00
Kenneth Graunke	e1409aead5	iris: Only store the SHA1 of the NIR in iris_uncompiled_shader Jason pointed out that we don't need to keep an entire copy of the serialized NIR around, we just need the SHA1. This does change our disk cache key to be taking a SHA1 of a SHA1, which is a bit odd, but should work out and be faster and use less memory. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-29 18:16:32 +00:00
Kenneth Graunke	6dc1c2d8bd	iris: Fix ALT mode regressions from shader cache We were checking this based on nir->info.name, but with the shader cache enabled, nir_strip throws out the name, causing us to use IEEE mode for ARB programs. gl-1.0-spot-light regressed because it wants ALT mode for 0^0 behavior. Fixes: `dc5dc727d5` iris: Serialize the NIR to a blob we can use for shader cache purposes.	2019-05-21 16:58:54 -07:00
Dylan Baker	601c9bc135	iris: Cache assembly shaders in the on-disk shader cache This implements storing and retrieving iris_compiled_shader objects from the on-disk shader cache. (by Dylan Baker and Kenneth Graunke)	2019-05-21 15:05:38 -07:00
Kenneth Graunke	dc5dc727d5	iris: Serialize the NIR to a blob we can use for shader cache purposes. We will use a hash of the serialized NIR together with brw_prog_*_key (for NOS) as the disk cache key, where the disk cache contains actual assembly shaders. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Kenneth Graunke	6ae2caf201	iris: Move iris_uncompiled_shader definition to iris_context.h It had been internal to iris_program.c, but with the upcoming disk cache code, the "program module" is going to be spread across a couple source files. Into a header it goes! Now it lives alongside iris_compiled_shader, which makes sense. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Kenneth Graunke	646924cfa1	intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8 Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:30 -07:00
Kenneth Graunke	dcfca0af7c	iris: Set XY Clipping correctly. I was setting it based off a pipe_rasterizer_state field that appears to be entirely dead outside of the draw module respecting it. I should be setting it when the primitive type reaching the SF is neither points nor lines. This is, unfortunately, rather dirty, as we have to look at the rasterizer state, the geometry shader state, the tessellation evaluation shader state, and the primitive type...	2019-04-29 10:53:23 -07:00
Kenneth Graunke	4c3c417b00	iris: Move iris_debug_recompile calls before uploading. Order of operations is important, otherwise we'll find the program we just uploaded as the "old" compile and get confused why nothing is different between the two keys. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:20 -07:00
Kenneth Graunke	04f97eefa3	iris: Print the reason for shader recompiles. I was lazy earlier and hadn't bothered typing / refactoring this. Now I'm hitting some extra recompiles and would like to see why. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:18 -07:00

1 2 3 4

159 commits