fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-27 14:28:22 +02:00

Author	SHA1	Message	Date
Francisco Jerez	812ede088f	intel/fs: Implement quad swizzles on ICL+. Align16 is no longer a thing, so a new implementation is provided using Align1 instead. Not all possible swizzles can be represented as a single Align1 region, but some fast paths are provided for frequently used swizzles that can be represented efficiently in Align1 mode. Fixes ~90 subgroup quad swap Vulkan CTS tests. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Francisco Jerez	c5f9c0009d	intel/fs: Handle source modifiers in lower_integer_multiplication(). lower_integer_multiplication() implements 32x32-bit multiplication on some platforms by bit-casting one of the 32-bit sources into two 16-bit unsigned integer portions. This can give incorrect results if the original instruction specified a source modifier. Fix it by emitting an additional MOV instruction implementing the source modifiers where necessary. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Andrii Simiklit	0206ffc28d	anv/pipeline: remove unnecessary null-pointer check Looks like it is impossible that 'last' variable is a null because at least the get_vs_prog_data shouldn't return a null pointer. So this check is unnecessary starts from commit: `99d497c5b6` "anv/pipeline: Replace get_fs_input_map with ..." This small issue is found by cppcheck. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 12:29:12 -06:00
Eric Engestrom	4f5a526789	anv: drop unneeded KHR suffix Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:47:56 +00:00
Karol Herbst	d0c6ef2793	nir: rename global/local to private/function memory the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:51:46 +01:00
Lionel Landwerlin	add5a2ec92	anv: flush fast clear colors into compressed surfaces In the following scenario : 1. Create image format R8G8B8A8_UNORM 2. Create image view format R8G8B8A8_SRGB 3. Clear the view through a sub pass to a particular color 4. Barrier on the image to from color attachment to source transfer 5. Copy the image into a linear buffer to check the content The step 4 resolving the clear color is unaware of the SRGB format of the view, because the blorp resolve operations operate on images the color associated with the resolve will not operate on SRGB format but UNORM. Leading to the wrong color being written into surfaces. This change forces a clear color resolve at the end of the render pass so following resolves won't have to deal with the clear color with a format that doesn't match the image's format. On gfxbench vulkan_5_normal 1280x720, this appear to cost us ~0.5fps, from 49.316 down to 48.949. v2: Only fast clear resolve when image & view have different formats (Lionel) v3: Update warning (Jason) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2019-01-08 16:37:00 +00:00
Lionel Landwerlin	366eb656ac	anv: explictly specify format for blorp ccs/mcs op Resolve operations can happen when dealing with view (begin/end subpasses) in which case the view's format needs to apply, not the image's format. v2: Relayout arguments of a ccs_op() call (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911 Cc: mesa-stable@lists.freedesktop.org	2019-01-08 16:36:56 +00:00
Jason Ekstrand	754eff07d2	anv: Sort properties and features switch statements Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-07 18:41:15 -06:00
Jason Ekstrand	05d72d6d48	spirv: Sort supported capabilities Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-07 18:41:15 -06:00
Jason Ekstrand	34af63fa22	anv: Enable the new deref-based UBO/SSBO path Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	63b9aa2e25	spirv: Add support for using derefs for UBO/SSBO access For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	adc155a815	spirv: Add explicit pointer types Instead of baking in uvec2 for UBO and SSBO pointers and uint for push constant and shared memory pointers, make it configurable. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	fc9c4f89b8	nir: Move propagation of cast derefs to a new nir_opt_deref pass We're going to want to do more deref optimizations going forward and this gives us a central place to do them. Also, cast propagation will get a bit more complicated with the addition of ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	6cebeb4f71	glsl_type: Add support for explicitly laid out matrices and arrays SPIR-V allows for matrix and array types to be decorated with explicit byte stride decorations and matrix types to be decorated row- or column-major. This commit adds support to glsl_type to encode this information. Because this doesn't work nicely with std430 and std140 alignments, we add asserts to ensure that we don't use any of the std430 or std140 layout functions with explicitly laid out types. In SPIR-V, the layout information for matrices is applied to the parent struct member instead of to the matrix type itself. However, this is gets rather clumsy when you're walking derefs trying to compute offsets because, the moment you hit a matrix, you have to crawl back the deref chain and find the struct. Instead, we take the same path here as we've taken in spirv_to_nir and put the decorations on the matrix type itself. This also subtly adds support for strided vector types. These don't come up in SPIR-V directly but you can get one as the result of taking a column from a row-major matrix or a row from a column-major matrix. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	f8992eb5ba	anv/apply_pipeline_layout: Set the cursor in lower_res_reindex_intrinsic The loop through instructions doesn't set the cursor for us so unless we set it somewhere, we may end up emitting instructions in the wrong place. The only reason why we haven't been bitten by this in the past is that it only happens in a few variable pointers cases and the CTS tests for those don't use much control flow so things were getting emitted in the correct order by accident. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Bas Nieuwenhuizen	110564fdec	anv/android: Do not reject storage images. We do the ImageFormatProperties check already, and rejecting an usage flag when both ImageFormatProperties and the WSI (which is Android) support it is not allowed. Intel does support storage for some of the support WSI formats, such as R8G8B8A8_UNORM, and looking at the ISL_SURF_USAGE_DISABLE_AUX_BIT, the imported images do not have any form of compression that would prevent this fix. v2: Also consider STORAGE bit for Gralloc usage bits. (From Kevin Strasser <kevin.strasser@intel.com>) Fixes: `053d4c328f` "anv: Implement VK_ANDROID_native_buffer (v9)" Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-07 15:20:55 +01:00
Jason Ekstrand	19c608fe43	intel/blorp: Be more conservative about copying clear colors In `92eb5bbc68` we attempted to avoid copying clear colors whenever we weren't doing a resolve. However, this broke MSAA resolves because we need the clear color in the source. This patch makes blorp much more conservative such that it only avoids the clear color copy if either aux_usage == NONE or it's explicitly doing a fast-clear. Fixes: `92eb5bbc68` "intel/blorp: Only copy clear color when doing..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728 Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-01-04 17:57:43 -06:00
Lionel Landwerlin	da634a4acb	intel/blorp: emit VF caching workaround before 3DSTATE_VERTEX_BUFFERS Probably no difference but it's nice to have i965 & blorp emit things in the same order. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-04 11:18:51 +00:00
Timothy Arceri	50de3f80a8	nir: rename nir_link_constant_varyings() nir_link_opt_varyings() The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Erik Faye-Lund	86089a7316	anv/autotools: make sure tests link with -msse2 Without this, I get the following error when building the tests with autotools on i686: ---8<--- src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’: src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration] __builtin_ia32_clflush(p); ^~~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_pause src/intel/common/gen_clflush.h: In function ‘gen_flush_range’: src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration] __builtin_ia32_mfence(); ^~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_fnclex ---8<--- The erros are generated for each of these files: - mesa/src/intel/vulkan/tests/state_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool.c - mesa/src/intel/vulkan/tests/block_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool_free_list_only.c This is obviously because gen_clflush.h contains code that uses intrinsics that are only available with SSE3. Since the driver already uses SSE3, it seems reasonable to add this to the tests as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Eric Engeström <eric@engestrom.ch>	2018-12-31 17:28:21 +01:00
Erik Faye-Lund	89679e18a9	anv/meson: make sure tests link with -msse2 Without this, I get the following error when building the tests using meson on i686: ---8<--- In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46, from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26: ../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’: ../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration] __builtin_ia32_clflush(p); ^~~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_pause ../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’: ../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration] __builtin_ia32_mfence(); ^~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_fnclex ---8<--- The errors are generated for each of these files: - mesa/src/intel/vulkan/tests/state_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool.c - mesa/src/intel/vulkan/tests/block_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool_free_list_only.c This is obviously because gen_clflush.h contains code that uses intrinsics that are only available with SSE3. Since the driver already uses SSE3, it seems reasonable to add this to the tests as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engeström <eric@engestrom.ch>	2018-12-31 17:27:33 +01:00
Lionel Landwerlin	f7bccf6ab4	intel/aub_viewer: highlight true booleans Useful to spot PIPE_CONTROL flags. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:46 +00:00
Lionel Landwerlin	6ba61ea391	intel/aub_viewer: fold binding/sampler table items Makes things easier to read rather than a long block of text. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:43 +00:00
Lionel Landwerlin	7ab8c80625	intel/aub_viewer: fix shader view Not decoding the shader at the right offset. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:40 +00:00
Lionel Landwerlin	f3ed4a058d	intel/aub_viewer: print address of missing shader Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:21 +00:00
Lionel Landwerlin	0382e11989	intel/aub_viewer: fixup 0x address prefix Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:18 +00:00
Lionel Landwerlin	8e2fda411a	intel/aub_viewer: fix shader get_bo Instruction addresses are always in ppgtt space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:08 +00:00
Lionel Landwerlin	e2ae5f2f0a	anv: don't do partial resolve on layer > 0 We've made the choice not to use fast clears on layer > 0 with multilayer images. This is partly because we would need to store multiple clear colors for each layer, making the existing memory layout, already including aux surfaces, fast clear color, image state, etc... even more complex. Partial resolves are the operations transfering the clear colors into the auxiliary buffers. This operation is currently implemented in Blorp by loading the clear color from the image's BO, into a shader that then samples from the auxiliary buffer and writes the color only if it isn't there already. The problem here is that because we store only one clear color for all layers and it is used for partial resolves. If you trigger a partial clear on a layer > 0, then you're likely to deal with a color that is not what you actually want. In the particular issues below, we have multiple layers, each cleared with a different color but the partial resolve just writes the wrong color into the auxiliary buffers for layers > 0. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108910 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911 Cc: mesa-stable@lists.freedesktop.org	2018-12-24 09:42:46 +00:00
Iago Toral Quiroga	d6110d4d54	intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs The former expects to see SSA-only things, but the latter injects registers. The assertions in the lowering where not seeing this because they asserted on the bit_size values only, not on the is_ssa field, so add that assertion too. Fixes: `11dc130779` "nir: Add a bool to int32 lowering pass" CC: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-20 08:02:44 +01:00
Tapani Pälli	3627c9efff	anv/android: turn on VK_ANDROID_external_memory_android_hardware_buffer Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:42 +02:00
Tapani Pälli	3dc424a4f4	anv: ignore VkSamplerYcbcrConversion on non-yuv formats This fulfills a requirement for clients that want to utilize same code path for images with external formats (VK_FORMAT_UNDEFINED) and "regular" RGBA images where format is known. This is similar to how OES_EGL_image_external works. To support this, we allow color conversion samplers for non-YUV formats but skip setting up conversion when format does not have can_ycbcr flag set. v2: add comment and bundle can_ycbcr to the existing break condition (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	a7b7772cfb	anv: support VkSamplerYcbcrConversionInfo in vkCreateImageView If a conversion struct was passed, then initialize view using format from the conversion structure. v2: use vk_format directly from the anv_format struct v3: added some assertions (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	bb0721aea4	anv: add VkFormat field as part of anv_format Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	c070b0e25f	anv: support VkExternalFormatANDROID in vkCreateSamplerYcbcrConversion If external format is used, we store the external format identifier in conversion to be used later when creating VkImageView. v2: rebase to `b43f955037` changes v3: added assert, ignore components when creating external format conversion (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	f1654fa7e3	anv/android: support creating images from external format Since we don't know the exact format at creation time, some initialization is done only when bound with memory in vkBindImageMemory. v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if image has external format v3: refactor prepare_ahw_image, support vkBindImageMemory2, calculate stride correctly for rgb(x) surfaces, rename as 'resolve_ahw_image' v4: rebase to `b43f955037` changes v5: add some assertions to verify input correctness (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	517103abf1	anv/android: add ahardwarebuffer external memory properties v2: have separate memory properties for android, set usage flags for buffers correctly v3: code cleanup (Jason) + limit maxArrayLayers to 1 for AHardwareBuffer based images v4: rebase to `b43f955037` changes Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	c79a528d2b	anv/android: support import/export of AHardwareBuffer objects v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB) v3: properly handle usage bits when creating from image v4: refactor, code cleanup (Jason) v5: rebase to `b43f955037` changes, initialize bo flags as ANV_BO_EXTERNAL (Lionel) v6: add assert that anv_bo_cache_import succeeds, add comment about multi-bo support to clarify current implementation (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	5c65c60d6c	anv: refactor, remove else block in AllocateMemory This makes it cleaner to introduce more cases where we import memory from different types of external memory buffers. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	884fc90fde	anv: add anv_ahw_usage_from_vk_usage helper function v2: rebase to `b43f955037` changes Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	1e6a44400a	anv/android: add GetAndroidHardwareBufferPropertiesANDROID Use the anv_format address in formats table as implementation-defined external format identifier for now. When adding YUV format support this might need to change. v2: code cleanup (Jason) v3: set anv_format address as identifier v4: setup suggestedYcbcrModel and suggested[X\|Y]ChromaOffset as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL v5: set linear tiling for GPU_DATA_BUFFER usage, add comment about multi-bo support to clarify current implementation (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	aa94e01bfe	anv: add from/to helpers with android and vulkan formats v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason) v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define for now to avoid direct dependency to minigbm headers Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	c1f15a0a1a	anv: make anv_get_image_format_features public This will be utilized later by GetAndroidHardwareBufferPropertiesANDROID. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	8a469fd335	anv: refactor make_surface to use data from anv_image Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	2a98e5bbb9	anv: add create_flags as part of anv_image This will make it possible for next patch to rip anv_image_create_info out from make_surface function. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Jason Ekstrand	3feda3cf35	anv: Bump the patch version to 96 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-18 09:40:46 -06:00
Ian Romanick	af07141b33	intel/compiler: More peephole_select for pre-Gen6 No shader-db changes on any Gen6+ platform. All of the shaders with cycles hurt by more than ~2% are from Master of Orion. All of the shaders have instructions helped. It looks like the pass enables some control flow to be converted to bcsels, then the scheduler does dumb things. These are new shaders (just added before doing this shader-db run), so there's probably some low-hanging fruit. Iron Lake total instructions in shared programs: 8214327 -> 8213684 (<.01%) instructions in affected programs: 84469 -> 83826 (-0.76%) helped: 114 HURT: 26 helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61% 95% mean confidence interval for instructions value: -5.87 -3.32 95% mean confidence interval for instructions %-change: -2.32% -1.17% Instructions are helped. total cycles in shared programs: 187736850 -> 187749314 (<.01%) cycles in affected programs: 506750 -> 519214 (2.46%) helped: 104 HURT: 36 helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16 helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63% HURT stats (abs) min: 4 max: 1402 x̄: 409.67 x̃: 40 HURT stats (rel) min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39% 95% mean confidence interval for cycles value: 28.32 149.74 95% mean confidence interval for cycles %-change: -0.07% 1.61% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 5044014 -> 5043652 (<.01%) instructions in affected programs: 46751 -> 46389 (-0.77%) helped: 63 HURT: 13 helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52% 95% mean confidence interval for instructions value: -6.54 -2.99 95% mean confidence interval for instructions %-change: -3.04% -1.28% Instructions are helped. total cycles in shared programs: 128143042 -> 128150188 (<.01%) cycles in affected programs: 324564 -> 331710 (2.20%) helped: 57 HURT: 19 helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32 helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81% HURT stats (abs) min: 10 max: 1400 x̄: 468.21 x̃: 60 HURT stats (rel) min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70% 95% mean confidence interval for cycles value: 6.90 181.15 95% mean confidence interval for cycles %-change: -0.52% 1.59% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	8fb8ebfbb0	intel/compiler: More peephole select Shader-db results: The one shader hurt for instructions is a compute shader that had both spills and fills hurt. v2: Fix typo in comment noticed by Caio. v3: Fix inverted condition in brw_nir.c. Noticed by Lionel. Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15072761 -> 15047884 (-0.17%) instructions in affected programs: 895539 -> 870662 (-2.78%) helped: 3623 HURT: 1 helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4 helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20% HURT stats (abs) min: 92 max: 92 x̄: 92.00 x̃: 92 HURT stats (rel) min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92% 95% mean confidence interval for instructions value: -7.10 -6.63 95% mean confidence interval for instructions %-change: -4.03% -3.82% Instructions are helped. total cycles in shared programs: 369738930 -> 369535732 (-0.05%) cycles in affected programs: 68027851 -> 67824653 (-0.30%) helped: 2609 HURT: 1035 helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77 helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47% HURT stats (abs) min: 1 max: 33336 x̄: 261.04 x̃: 20 HURT stats (rel) min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47% 95% mean confidence interval for cycles value: -96.43 -15.09 95% mean confidence interval for cycles %-change: -6.07% -5.36% Cycles are helped. total spills in shared programs: 10158 -> 10159 (<.01%) spills in affected programs: 166 -> 167 (0.60%) helped: 1 HURT: 1 total fills in shared programs: 22105 -> 22116 (0.05%) fills in affected programs: 837 -> 848 (1.31%) helped: 4 HURT: 1 Ivy Bridge total instructions in shared programs: 12021190 -> 11990256 (-0.26%) instructions in affected programs: 910561 -> 879627 (-3.40%) helped: 3344 HURT: 18 helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6 helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31% HURT stats (abs) min: 2 max: 20 x̄: 7.89 x̃: 6 HURT stats (rel) min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70% 95% mean confidence interval for instructions value: -9.49 -8.91 95% mean confidence interval for instructions %-change: -5.32% -4.98% Instructions are helped. total cycles in shared programs: 179077826 -> 178570196 (-0.28%) cycles in affected programs: 63205667 -> 62698037 (-0.80%) helped: 2767 HURT: 620 helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88 helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09% HURT stats (abs) min: 1 max: 31255 x̄: 152.27 x̃: 11 HURT stats (rel) min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58% 95% mean confidence interval for cycles value: -173.94 -125.81 95% mean confidence interval for cycles %-change: -7.68% -6.97% Cycles are helped. Sandy Bridge total instructions in shared programs: 10852569 -> 10843758 (-0.08%) instructions in affected programs: 235803 -> 226992 (-3.74%) helped: 800 HURT: 0 helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8 helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36% 95% mean confidence interval for instructions value: -11.93 -10.10 95% mean confidence interval for instructions %-change: -4.99% -4.39% Instructions are helped. total cycles in shared programs: 154732047 -> 154608941 (-0.08%) cycles in affected programs: 4063110 -> 3940004 (-3.03%) helped: 606 HURT: 253 helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62 helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81% HURT stats (abs) min: 1 max: 1966 x̄: 59.36 x̃: 11 HURT stats (rel) min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67% 95% mean confidence interval for cycles value: -170.49 -116.13 95% mean confidence interval for cycles %-change: -2.61% -1.65% Cycles are helped. No change on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	4cd1a0be76	i965/vec4: Propagate conditional modifiers from more compares to other compares If there is a CMP.NZ that compares a single component (via a .zzzz swizzle, for example) with 0, it can propagate its conditional modifier back to a previous CMP that writes only that component. The specific case that I saw was: cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF ... cmp.nz.f0(8) null<1>D g42<4>.xD 0D In this case we can just delete the second CMP. No changes on Broadwell or Skylake because they do not use the vec4 backend. Also no changes on GM45 or Iron Lake. Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown) total instructions in shared programs: 10856676 -> 10852569 (-0.04%) instructions in affected programs: 228322 -> 224215 (-1.80%) helped: 1331 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4 helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83% 95% mean confidence interval for instructions value: -3.19 -2.99 95% mean confidence interval for instructions %-change: -1.93% -1.83% Instructions are helped. total cycles in shared programs: 154788865 -> 154732047 (-0.04%) cycles in affected programs: 2485892 -> 2429074 (-2.29%) helped: 1097 HURT: 59 helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64 helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22% HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2 HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71% 95% mean confidence interval for cycles value: -51.04 -47.26 95% mean confidence interval for cycles %-change: -3.40% -3.07% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00

... 66 67 68 69 70 ...

7038 commits