fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-29 03:00:23 +01:00

Author	SHA1	Message	Date
Caio Marcelo de Oliveira Filho	e45bf01940	spirv: Change spirv_to_nir() to return a nir_shader spirv_to_nir() returned the nir_function corresponding to the entrypoint, as a way to identify it. There's now a bool is_entrypoint in nir_function and also a helper function to get the entry_point from a nir_shader. The return type reflects better what the function name suggests. It also helps drivers avoid the mistake of reusing internal shader references after running NIR_PASS on it. When using NIR_TEST_CLONE or NIR_TEST_SERIALIZE, those would be invalidated right in the first pass executed. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 10:34:35 -07:00
Kenneth Graunke	bc273dece2	intel/decoder: Use get_state_size() over guessed counts in more cases This makes the following packets use actual driver provided sizes rather than guessing an arbitrary number: - CC_VIEWPORT - SF_CLIP_VIEWPORT - BLEND_STATE - COLOR_CALC_STATE - SCISSOR_RECT Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-05-28 13:44:16 -07:00
Kenneth Graunke	6a9e39d44b	iris: Ask st to vectorize our IO. (Technically this is common code, but it doesn't affect i965 or anv.) Improves performance of GFXBench5/gl_tess_off on Skylake GT4e at 1080p by 9.3933% +/- 0.0305157% by eliminating all spilling in the GS. Improves performance of GFXBench5/gl_4_off (Car Chase) on Skylake GT4e at 1080p by 0.325208% +/- 0.0842233% (n=18). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-28 01:06:48 -07:00
Lionel Landwerlin	2042f22e28	anv: fix apply_pipeline_layout pass for arrays of YCbCr descriptors When using the binding tables to access arrays of YCbCr descriptors we did not consider the offset of the accessed element. We can't do a simple multiple because the binding table entries are tightly packed. For example element 0 of the array could use 2 entries/planes and element 1 could use 2 entries/planes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bb8768b9d` ("anv: toggle on support for VK_EXT_ycbcr_image_arrays") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-05-27 22:47:53 +01:00
Chenglei Ren	13b38ca1e4	anv/android: fix missing dependencies issue during parallel build The libmesa_anv_gen* modules require anv_extensions.h, patch makes sure it gets generated as a dependency before building them. Signed-off-by: Chenglei Ren <chenglei.ren@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2019-05-27 10:13:17 +03:00
Jason Ekstrand	f2dc0f2872	nir: Drop imov/fmov in favor of one mov instruction The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Rob Clark <robdclark@chromium.org>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	8ffbb54405	intel: Implement abs, neg, and sat in the back-end Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	4fde459563	intel/nir: Call alu_to_scalar one last time before out-of-ssa A few of our very late passes can end up generating vectors accidentally so we need to get rid of them. The only known case of this is the ffma peephole which generates fneg and fabs as vectors. Currently, they're not a problem because they get turned into fmov which the back-end compiler knows how to handle as a vector. That's about to change. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	ddd08e1888	nir/builder: Remove the use_fmov parameter from nir_swizzle This flag has caused more confusion than good in most cases. You can validly use imov for floats or fmov for integers because, without source modifiers, neither modify their input in any way. Using imov for floats is more reliable so we go that direction. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Danylo Piliaiev	c82dcf89ae	anv: Do not emulate texture swizzle for INPUT_ATTACHMENT, STORAGE_IMAGE If descriptorType is VK_DESCRIPTOR_TYPE_STORAGE_IMAGE or VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT, the imageView member of each element of pImageInfo must have been created with the identity swizzle. Fixes: `d2aa65eb` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-24 09:20:38 +00:00
Lionel Landwerlin	cb7c9b2a93	vulkan: fix build dependency issue with generated files On machines with many cores, you can run into that issue : ../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory v2: Move declare_dependency around (Eric) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Jan Ziak Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-22 14:07:14 +00:00
Kenneth Graunke	419d9b21e1	intel: Move brw_prog_key_set_id from i965 to the compiler. I want to use it in iris. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Caio Marcelo de Oliveira Filho	cf05ffbfd6	anv: Don't re-use entry_point pointer from spirv_to_nir When running with NIR_TEST_CLONE=1, the pointer will not be valid, as the whole shader is going to be recreated every pass. Prefer using is_entrypoint (to query when looping) and nir_shader_get_entrypoint() instead. Fixes the Vulkan Piglit tests - vulkan/glsl450/frexp-double - vulkan/glsl450/isinf-double - vulkan/shaders/fs-multiple-large-local-array Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-20 16:47:39 -07:00
Caio Marcelo de Oliveira Filho	31a7476335	spirv, radv, anv: Replace ptr_type with addr_format Instead of setting the glsl types of the pointers for each resource, set the nir_address_format, from which we can derive the glsl_type, and in the future the bit pattern representing a NULL pointer. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Jason Ekstrand	1c92358bd8	anv: Only consider minSampleShading when sampleShadingEnable is set From the Vulkan 1.1.107 spec: Sample shading is enabled for a graphics pipeline: - If the interface of the fragment shader entry point of the graphics pipeline includes an input variable decorated with SampleId or SamplePosition. In this case minSampleShadingFactor takes the value 1.0. - Else if the sampleShadingEnable member of the VkPipelineMultisampleStateCreateInfo structure specified when creating the graphics pipeline is set to VK_TRUE. In this case minSampleShadingFactor takes the value of VkPipelineMultisampleStateCreateInfo::minSampleShading. Otherwise, sample shading is considered disabled. In other words, if sampleShadingEnable is set to VK_FALSE, we should ignore minSampleShading. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-17 20:33:57 +00:00
Jason Ekstrand	8413fd136c	anv: Stop forcing bindless for images This was an unintended artifact of my testing of bindless images. We should be choosing bindless or not dynamically. Fixes: `c0d9926df7` "anv: Use bindless handles for images" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-17 19:58:51 +00:00
Jason Ekstrand	d2aa65eb18	anv: Emulate texture swizzle in the shader when needed Now that we have the descriptor buffer mechanism, emulated texture swizzle can be implemented in a very non-invasive way. Previous attempts all tried to extend the push constant based image param mechanism which was gross. This could, in theory, be done much faster with a magic back-end instruction which does indirect MOVs but Vulkan on IVB is already so slow this isn't going to matter much. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104355 Cc: "19.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-17 12:25:58 -05:00
Nanley Chery	629806b55b	anv: Fix some depth buffer sampling cases on ICL+ Don't attempt sampling with HiZ if the sampler lacks support for it. On ICL, the HW docs state that sampling with HiZ is not supported and that instances of AUX_HIZ in the RENDER_SURFACE_STATE object will be interpreted as AUX_NONE. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-05-16 20:54:53 +00:00
Jason Ekstrand	fce0214e94	intel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks For a block with a contiguous chunk of 32 vars that don't need updating, this lets us skip 32 vars at a time. Also, by using bitscan, we only iterate for each set bit rather than testing them all one at a time. Looking at perf (with -O0 which is unfortunately necessary to get reasonable back-traces), this seems to cuts about 50-60% of the time spent in compute_start_end() which is, itself about 4-6% of the run-time. In the real world, with a release driver build, this cuts 1.34% off a full shader-db run. (I ran shader-db 5 times in each configuration). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-16 02:14:40 +00:00
Jason Ekstrand	b2d274c677	intel/fs/ra: Choose a spill reg before throwing away the graph Otherwise, we get an effectively random spill reg because we no longer have the information from RA to guide us. Also, a completely clean graph has undefined data in in_stack which is used for choosing the spill reg so it really is non-deterministic. Fixes: `e99081e76d` "intel/fs/ra: Spill without destroying the..." Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Jason Ekstrand	c19acf321c	intel/fs/ra: Add spill costs to the graph on-demand Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Jason Ekstrand	2c14e2b5bf	intel/fs/ra: Add a helper for discarding the interference graph Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Dave Airlie	4efd04ab18	intel/compiler: use bitset instead of opencoding a 32-bit bitset. (v2) In the future I want to expand this to 128-bits, for vec16 support, so lets just put the code in place to use bitset ranges now. v2: just declare the bitset to be the max of what we should ever see and change assert to reflect it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 07:10:34 +10:00
Dave Airlie	3b2c433167	intel/compiler: remove repeated bit_size / 8 in brw mem lowering pass. Just use a variable already. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 07:10:30 +10:00
Kenneth Graunke	646924cfa1	intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8 Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:30 -07:00
Kenneth Graunke	076159b40b	intel/compiler: Move ICP handle fetching into a helper function. This will be significantly different in 8_PATCH mode. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:28 -07:00
Kenneth Graunke	3d84fd29e8	intel/compiler: Don't repeat dispatch max fixing condition Having a single flag will keep both places in sync if the condition gets more complicated. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:27 -07:00
Kenneth Graunke	f0d52cf2b0	intel/compiler: Rename invocation_id_mask to instance_id_mask The payload field is actually "instance" (thread number), which is used to calculate the invocation ID. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:25 -07:00
Kenneth Graunke	d86260719e	intel/compiler: Refactor TCS invocation ID setup into a helper When we add 8_PATCH mode, this will get a bit more complex, so we may as well start by putting it in a helper function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:24 -07:00
Ian Romanick	45c7ff95fc	intel/compiler: Repeat nir_opt_algebraic_late A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 -> 17224623 (<.01%) instructions in affected programs: 4586 -> 4575 (-0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.36% -0.19% Instructions are helped. total cycles in shared programs: 360828542 -> 360828714 (<.01%) cycles in affected programs: 151159 -> 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: -13.48 17.95 95% mean confidence interval for cycles %-change: -0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 -> 13529542 (<.01%) instructions in affected programs: 358 -> 356 (-0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 -> 357289678 (<.01%) cycles in affected programs: 178324 -> 177691 (-0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: -18.28 3.89 95% mean confidence interval for cycles %-change: -1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 -> 8158980 (<.01%) instructions in affected programs: 22719 -> 22589 (-0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: -2.06 -1.94 95% mean confidence interval for instructions %-change: -0.78% -0.68% Instructions are helped. total cycles in shared programs: 188609448 -> 188609214 (<.01%) cycles in affected programs: 1875852 -> 1875618 (-0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: -1.95 -0.25 95% mean confidence interval for cycles %-change: -0.04% -0.01% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	a79570099b	intel/fs: Allow cmod propagation to instructions with saturate modifier v2: Add unit tests. Suggested by Matt. All Intel GPUs had similar results. (Ice Lake shown) total instructions in shared programs: 17229441 -> 17228658 (<.01%) instructions in affected programs: 159574 -> 158791 (-0.49%) helped: 489 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.60 x̃: 1 helped stats (rel) min: 0.07% max: 2.70% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.72 -1.48 95% mean confidence interval for instructions %-change: -0.64% -0.58% Instructions are helped. total cycles in shared programs: 360944149 -> 360937144 (<.01%) cycles in affected programs: 1072195 -> 1065190 (-0.65%) helped: 254 HURT: 27 helped stats (abs) min: 2 max: 234 x̄: 30.51 x̃: 9 helped stats (rel) min: 0.04% max: 8.99% x̄: 0.75% x̃: 0.24% HURT stats (abs) min: 2 max: 83 x̄: 27.56 x̃: 24 HURT stats (rel) min: 0.09% max: 3.79% x̄: 1.28% x̃: 1.16% 95% mean confidence interval for cycles value: -30.11 -19.75 95% mean confidence interval for cycles %-change: -0.70% -0.41% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]	2019-05-14 11:38:21 -07:00
Jason Ekstrand	e99081e76d	intel/fs/ra: Spill without destroying the interference graph Instead of re-building the interference graph every time we spill, we modify it in place so we can avoid recalculating liveness and the whole O(n^2) interference graph building process. We make a simplifying assumption in order to do so which is that all spill/fill temporary registers live for the entire duration of the instruction around which we're spilling. This isn't quite true because a spill into the source of an instruction doesn't need to interfere with its destination, for instance. Not re-calculating liveness also means that we aren't adjusting spill costs based on the new liveness. The combination of these things results in a bit of churn in spilling. It takes a large cut out of the run-time of shader-db on my laptop. Shader-db results on Kaby Lake: total instructions in shared programs: 15311224 -> 15311360 (<.01%) instructions in affected programs: 77027 -> 77163 (0.18%) helped: 11 HURT: 18 total cycles in shared programs: 355544739 -> 355830749 (0.08%) cycles in affected programs: 203273745 -> 203559755 (0.14%) helped: 234 HURT: 190 total spills in shared programs: 12049 -> 12042 (-0.06%) spills in affected programs: 2465 -> 2458 (-0.28%) helped: 9 HURT: 16 total fills in shared programs: 25112 -> 25165 (0.21%) fills in affected programs: 6819 -> 6872 (0.78%) helped: 11 HURT: 16 Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	147665d0a2	intel/fs/ra: Put the VGRFs at the end of the nodes This is slightly less convenient in some places but it will make it much easier when we want to start adding nodes dynamically. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	e7b7d572b3	intel/fs/ra: Re-arrange interference setup The old code was arranged by the type of interference being added. It would set up payload registers and then add payload interference for all VGRFs. It would set up MRFs and add MRF interference for all VGRFs. This commit re-arranges things to be organized differently. It first creates and sets up all RA nodes and then groups interference into two new categories: live range and instruction interference. Once all the RA nodes have been set up, it walks the list of VGRFs and sets up their live range interference and then walks the list of instructions and sets up instruction interference. This new arrangement will be advantageous for a future patch but, at the moment, it cuts 2% off the run-time of shader-db on my laptop. Shader-db results on Kaby Lake: total instructions in shared programs: 15311224 -> 15311224 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355544739 -> 355544739 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	0fd60e95fb	intel/fs/ra: Do the spill loop inside RA Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	47b1dcdcab	intel/fs/ra: Only add MRF hack interference if we're spilling The only use of the MRF hack these days is for spilling and there we don't need the precise MRF usage information. If we're spilling then we know pretty well how many MRFs are going to be used. It is possible if the only things that are spilled have fewer SIMD channels than the dispatch width of the shader that this may be more MRFs than needed. That's a risk we're willing to takd. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311224 (<.01%) instructions in affected programs: 16664 -> 16788 (0.74%) helped: 1 HURT: 5 total cycles in shared programs: 355543197 -> 355544739 (<.01%) cycles in affected programs: 731864 -> 733406 (0.21%) helped: 3 HURT: 6 The hurt shaders are all SIMD32 compute shaders where we reserve enough space for a 32-wide spill/fill but don't need it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	69878a9bb0	intel/fs/ra: Pull the guts of RA into its own class This accomplishes two things. First, it makes interfaces which are really private to RA private to RA. Second, it gives us a place to store some common stuff as we go through the algorithm. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	9e00a251be	intel/fs/ra: Move assign_regs further down in the file It's the main function from which all the other functions are called. It belongs at the bottom. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	5d9ac57c8c	intel/fs/ra: Split building the interference graph into a helper Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	472ef2f98d	intel/fs/ra: Initialize grf_used with first_non_payload_grf There's no reason why we need to use the calculated payload_node_count value which is just first_non_payload_grf aligned up. The grf_used value will be aligned up to 16 anyway (which is a much bigger alignment) before being handed off to hardware. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	096ad8a809	intel/fs/ra: Stop adding RA interference to too many SENDS nodes We only have one node per VGRF so this was adding way too much interference. No idea how we didn't catch this before. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355543197 (0.02%) cycles in affected programs: 2472492 -> 2547639 (3.04%) helped: 17 HURT: 20 Fixes: `014edff0d2` "intel/fs: Add interference between SENDS sources" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	88cac12230	intel/fs/ra: Only add dest interference to sources that exist Fixes: `83dedb6354` "i965: Add src/dst interference for certain" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	6212326941	intel/fs: Stop doing extra RA calls In the last phase of the schedule and RA loop, the RA call is redundant if we spill. Immediately afterwards, we're going to see that we couldn't allocate without spilling and call back into RA and tell it to go ahead and spill. We've known about it for a while but we've always brushed over it on the theory that, if you're going to spill, you'll be calling RA a bunch anyway and what does one extra RA hurt? As it turns out, it hurts more than you'd expect. Because the RA interference graph gets sparser with each spill and the RA algorithm is more efficient on sparser graphs, the RA call that we're duplicating is actually the most expensive call in the RA-and-spill loop. There's another extra RA call we do that's a bit harder to see which this also removes. If we try to compile a shader that isn't the minimum dispatch width and it fails to allocate without spilling we call fail() to set an error but then go ahead and do the first spilling RA pass and only after that's complete do we detect the fail and bail out. By making minimum dispatch widths part of the spill condition, we side-step this problem. Getting rid of these extra spills takes the compile time of a nasty Aztec Ruins shader from about 28 seconds to about 26 seconds on my laptop. It also makes shader-db 1.5% faster Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Nanley Chery	29a13eb71d	isl: Add restrictions to isl_surf_get_hiz_surf() Import some restrictions from intel_tiling_supports_hiz() and intel_miptree_supports_hiz(). Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	d57242190e	isl: Add restriction and comments to isl_surf_get_ccs_surf() Import some restrictions and comments from intel_miptree_supports_ccs(). Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	1de089797c	isl: Modify restrictions in isl_surf_get_mcs_surf() Import some restrictions from intel_miptree_supports_mcs() and don't assume that the caller knows which device generations are supported. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Jason Ekstrand	0745d4bd96	anv: Implement VK_KHR_uniform_buffer_standard_layout There's no real work to do here since we already support scalar block layout which is a direct superset of what this extension allows. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-13 17:20:33 -05:00
Vinson Lee	20b42fad9b	intel/tools: Fix build with glibc < 2.27. glibc < 2.27 defines OVERFLOW in /usr/include/math.h. This patch fixes this build error. In file included from ../include/c99_math.h:37:0, from ../src/util/u_math.h:44, from ../src/mesa/main/macros.h:35, from ../src/intel/compiler/brw_reg.h:47, from ../src/intel/tools/i965_asm.h:32, from ../src/intel/tools/i965_gram.y:29: src/intel/tools/i965_gram.tab.c:562:5: error: expected identifier before numeric constant OVERFLOW = 412, ^ Fixes: `70308a5a8a` ("intel/tools: New i965 instruction assembler tool") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110656 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-05-13 11:05:48 -07:00
Mike Blumenkrantz	7b2468bf6e	intel: drop misleading driver name from gen_get_device_info()	2019-05-11 04:14:06 +00:00
Caio Marcelo de Oliveira Filho	3610081daa	anv: Fix limits when VK_EXT_descriptor_indexing is used Update various limits in VkPhysicalDeviceDescriptorIndexingPropertiesEXT that were previously zero to their values from VkPhysicalDeviceLimits. When using VK_EXT_descriptor_indexing, the former limits will apply to all the descriptor layout sets -- not only those using the new feature bits. For the reference, VK_EXT_descriptor_indexing says "There are new descriptor set layout and descriptor pool creation flags that are required to opt in to the update-after-bind functionality, and there are separate maxPerStage* and maxDescriptorSet* limits that apply to these descriptor set layouts which may be much higher than the pre-existing limits. The old limits only count descriptors in non-updateAfterBind descriptor set layouts, and the new limits count descriptors in all descriptor set layouts in the pipeline layout." Fixes: `6e230d7607` "anv: Implement VK_EXT_descriptor_indexing" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-10 15:15:11 -07:00

1 2 3 4 5 ...

4254 commits