fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-02-02 17:30:29 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	74a9e51696	intel/genxml: Delete empty groups They serve no purpose other than to just fill empty space in the packet so each dword has something. Just disallowing empty groups is a bit easier on some of the tools. This does not change the generated packing headers in any way. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-13 07:37:23 -08:00
Jason Ekstrand	54a6f7eaca	anv: Don't crash on invalid heap sizes when the PCI ID is overriden	2017-11-13 07:37:23 -08:00
Kenneth Graunke	9a0465b3a3	intel/tools: Fix detection of enabled shader stages. We renamed "Function Enable" to "Enable", which broke our detection of whether shaders are enabled or not. So, we'd see a bunch of HS/DS packets with program offsets of 0, and think that was a valid TCS/TES. Fixes: `c032cae9ff` (genxml: Rename "Function Enable" to "Enable".) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-12 00:16:40 -08:00
Dylan Baker	854455498c	autotools: Set C++ visibility flags on Intel These flags are set for C sources, but not C++. This causes symbol visibility leaks from the C++ parts of the Intel compiler. Fixes: `700bebb958` ("i965: Move the back-end compiler to src/intel/compiler") Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-11-10 09:41:55 -08:00
Chad Versace	cd6f79a71d	anv/meson: Generate dev_icd.json I tested this in a setup where the builddir was outside of the srcdir. Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-11-09 16:29:33 -08:00
Chad Versace	b7441ef252	anv: Fix architecture in intel_icd.{arch}.json Use the host arch, not the target arch. In Meson and in recent Autotools, the host arch is where the binary will be used. The target arch is useful only when compiling a compiler. See: http://mesonbuild.com/Cross-compilation.html See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html Reported-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-11-09 16:29:31 -08:00
Chad Versace	2a4798ad98	anv: Refactor anv_GetImageSubresourceLayout() Its helper function, anv_surface_get_subresource_layout(), was not very helpful. So fold it into the main function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	69e3f0b02e	anv/image: Refactor choice of isl_tiling_flags_t Instead of choosing the tiling flags inside make_surface(), which is called once per aspect in a loop, and which chooses the same tiling for each aspect, choose the tiling flags exactly once before entering the aspect loop. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	7bb4387105	anv: Refactor anv_get_format_plane() - explicit unsupported The same local variable, 'plane_format', was returned on success and failure. Be more explicit in distinguishing the two cases: return 'plane_format' on success and return 'unsupported' on failure. This simplifies the diff in upcoming patches for VK_EXT_image_drm_format_modifier. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	3ee7f4bc2f	anv: Remove anv_physical_device_get_format_properties() Fold its body into its sole caller, anv_GetPhysicalDeviceFormatProperties(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	891d237667	anv: Simplify anv_physical_device_get_format_properties() Now that get_image_format_properties() returns the correct VkFormatFeatureFlags, we can remove the unneeded if-branch and some local variables. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	b3e2ce0580	anv: Simplify anv_get_image_format_properties() Now that get_image_format_features() has a VkImageTiling parameter, we can bypass anv_physical_device_get_format_properties() and call get_image_format_features() directly. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	cd3fe376e0	anv: Rename get_image_format_properties() The name is misleading. It looks like vkGetPhysicalDeviceImageFormatProperties(), but it actually implement vkGetPhysicalDeviceFormatProperties. Let's rename it to what it actually does, get_image_format_features(), because it returns VkFormatFeatureFlags. For consistency, also rename get_buffer_format_properties() to get_buffer_format_features(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	17ac61a2c9	anv: Fix get_image_format_properties() - YCbCr Teach it to calculate the format features for YCbCr. The goal (which is completed in this patch) is to incrementally fix get_image_format_properties() to return a correct result. Previously, it returned incorrect VkFormatFeatureFlags which the caller needed clean up. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	eaa49ec3fc	anv: Fix get_image_format_properties() - 3-channel formats Teach it to calculate the format features for 3-channel formats. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	6394e4a380	anv: Refactor get_image_format_properties() - Reduce params Replace parameters 'enum isl_format' and 'struct anv_format_plane' with new parameter 'const struct anv_format *'. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-09 16:01:59 -08:00
Chad Versace	66647074a4	anv: Refactor get_image_format_properties() - base_isl_format Rename parameter 'base' to 'base_isl_format'. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-09 16:01:59 -08:00
Chad Versace	c22a9f10be	anv: Refactor get_image_format_properties() - plane_format Rename parameter 'format' to 'plane_format'. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	096fc6915b	anv: Refactor get_image_format_properties() - ASTC Teach it to calculate the format features for ASTC. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. v2: New commit message Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	8ae4e97536	anv: Refactor get_image_format_properties() - depthstencil (v2) Teach it to calculate the features of depthstencil formats. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. v2: New commit message Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-11-09 16:01:59 -08:00
Chad Versace	6720abf292	anv: Better types for 'aspect' function params Some functions have a comment that says "Exactly one bit must be in 'aspect'". So change the type of their 'aspect' parameter from VkImageAspectFlags to VkImageAspectFlagBits. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-09 16:01:59 -08:00
Chad Versace	342c811646	anv: Refactor get_buffer_format_properties() Make it a stand-alone function. Pre-patch, for some formats the function returned incorrect VkFormatFeatureFlags which were cleaned up by the caller. This prepares for a cleaner implementation of VK_EXT_image_drm_format_modifier. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-11-09 16:01:59 -08:00
Nicolai Hähnle	ffc2060616	anv: fix build failure Fixes: `e3a8013de8` ("util/u_queue: add util_queue_fence_wait_timeout")	2017-11-09 14:49:19 +01:00
Jason Ekstrand	951a5dc4cc	intel/nir: Use the correct indirect lowering masks in link_shaders Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't lower indirects on the fragment shader which is wrong. Instead of using a single indirect mask, take advantage of our new little helper. Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-08 20:10:04 -08:00
Timothy Arceri	f98a2768ca	mesa: Add new fast mtx_t mutex type for basic use cases While modern pthread mutexes are very fast, they still incur a call to an external DSO and overhead of the generality and features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, and the idea here is that we can inline the atomic operation and make the fast case just two intructions. Mutexes are subtle and finicky to implement, so we carefully copy the implementation from Ulrich Dreppers well-written and well-reviewed paper: "Futexes Are Tricky" http://www.akkadia.org/drepper/futex.pdf We implement "mutex3", which gives us a mutex that has no syscalls on uncontended lock or unlock. Further, the uncontended case boils down to a cmpxchg and an untaken branch and the uncontended unlock is just a locked decr and an untaken branch. We use __builtin_expect() to indicate that contention is unlikely so that gcc will put the contention code out of the main code flow. A fast mutex only supports lock/unlock, can't be recursive or used with condition variables. We keep the pthread mutex implementation around as for the few places where we use condition variables or recursive locking. For platforms or compilers where futex and atomics aren't available, simple_mtx_t falls back to the pthread mutex. The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound applications. Most CPU bound cases are helped and some of our internal bind_buffer_object heavy benchmarks gain up to 10%. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-09 12:07:48 +11:00
Jason Ekstrand	3e63cf893f	intel/nir: Break the linking code into a helper in brw_nir.c Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-08 14:09:51 -08:00
Jason Ekstrand	7364f080f9	intel/nir: Add a helper for getting the NoIndirect mask Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-08 14:09:49 -08:00
Emil Velikov	ba414dba4f	automake: intel: correctly append to the LIBADD variable Commit `05fc62d89f` sets the variable, yet it forgot the update the existing reference to append (instead of assign). Thus as-is the expat library was discarded from the link chain when building with Android. Fixes: `05fc62d89f` ("automake: intel: move expat handling where it's used") Cc: Hongxu Jia <hongxu.jia@windriver.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-11-08 14:23:57 +00:00
Jason Ekstrand	d002950e54	intel/fs/nir: Return Q types from brw_reg_type_for_bit_size Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	dee58ecd2e	intel/fs/nir: Use Q immediates for load_const on gen8+ Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	9bb34892bf	intel/fs/nir: Setup immediates based on type in i2b and f2b Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	1cb210f4bc	intel/reg: Add helpers for 64-bit integer immediates Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2017-11-07 10:41:24 -08:00
Jason Ekstrand	ab9220edd6	nir,intel/compiler: Use a fixed subgroup size The GL_ARB_shader_ballot spec says that gl_SubGroupSizeARB is declared as a uniform. This means that it cannot change across an invocation such as a draw call or a compute dispatch. For compute shaders, we're ok because we only ever use one dispatch size. For fragment, however, the hardware dynamically chooses between SIMD8 and SIMD16 which violates the spec. Instead, let's just pick a subgroup size based on the shader stage. The fixed size we choose for compute shaders is a bit higher than strictly needed but there's no real harm in that. The advantage is that, if they do anything interesting with the value, NIR will see it as an immediate and can optimize better. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	a026458020	nir/lower_subgroups: Lower ballot intrinsics to the specified bit size Ballot intrinsics return a bitfield of subgroups. In GLSL and some SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot, they return a uvec4. Also, some back-ends would rather pass around 32-bit values because it's easier than messing with 64-bit all the time. To solve this mess, we make nir_lower_subgroups take a new parameter called ballot_bit_size and it lowers whichever thing it gets in from the source language (uint64_t or uvec4) to a scalar with the specified number of bits. This replaces a chunk of the old lowering code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	28da82f978	nir: Add a new subgroups lowering pass This commit pulls nir_lower_read_invocations_to_scalar along with most of the guts of nir_opt_intrinsics (which mostly does subgroup lowering) into a new nir_lower_subgroups pass. There are various other bits of subgroup lowering that we're going to want to do so it makes a bit more sense to keep it all together in one pass. We also move it in i965 to happen after nir_lower_system_values to ensure that because we want to handle the subgroup mask system value intrinsics here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1ca3a94427	intel/fs: Don't use automatic exec size inference The automatic exec size inference can accidentally mess things up if we're not careful. For instance, if we have add(4) g38.2<4>D g38.1<8,2,4>D g38.2<8,2,4>D then the destination register will end up having a width of 2 with a horizontal stride of 4 and a vertical stride of 8. The EU emit code sees the width of 2 and decides that we really wanted an exec size of 2 which doesn't do what we wanted. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	dc4cf11dfc	intel/fs: Explicitly set EXECUTE_1 where needed Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ab378734f5	intel/eu: Explicitly set EXECUTE_1 where needed Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	8280560705	intel/eu: Make automatic exec sizes a configurable option We have had a feature in codegen for some time that tries to automatically infer the execution size of an instruction from the width of its destination. For things such as fixed function GS, clipper, and SF programs, this is very useful because they tend to have lots of hand-rolled register setup and trying to specify the exec size all the time would be prohibitive. For things that come from a higher-level IR, however, it's easier to just set the right size all the time and the automatic exec sizes can, in fact, cause problems. This commit makes it optional while enabling it by default. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	7a82ad54bb	intel/fs: Rework zero-length URB write handling Originally we tried to handle this case based on slots_valid. However, there are a number of ways that this can go wrong. For one, we throw away any trailing slots which either aren't written or are set to VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not actually write anything there. Between the lot of these, it was possible to end up in a case where we tried to do a regular URB write but ended up with a length of 1 which is invalid. This commit moves it to the end and makes it based on a new boolean flag urb_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6132992cdb	intel/compiler/fs: Set up subgroup invocation as a system value Subgroup invocation is computed using a vector immediate and some dispatch-aware arithmetic. Unfortunately, due to the vector arithmetic, and the fact that it's frequently read 16-wide, it's not something that can easily be CSEd by the back-end compiler. There are a few different possible approaches to this problem: 1) Emit the code to calculate the subgroup invocation on-the-fly and trust NIR to do the CSE. This is what we were doing. 2) Add a back-end instruction for the subgroup ID. This has the advantage of helping the back-end compiler with CSE but has the downside of very poor scheduling for the calculation because it has to be emitted in the back-end. 3) Emit the calculation at the top of the program and re-use the result. This gets rid of the CSE problem but comes at the cost of an extra live register. This commit switches us from 1) to 3). We choose to store the subgroup invocation values as a W type to reduce the impact of the extra live register. Trusting NIR and using 1) was fine but we're soon going to want to use the subgroup invocation value for other things in the back-end compiler and this makes it much easier to do without having to worry about CSE problems. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	295605c930	intel/cs: Push subgroup ID instead of base thread ID We're going to want subgroup ID for SPIR-V subgroups eventually anyway. We really only want to push one and calculate the other from it. It makes a bit more sense to push the subgroup ID because it's simpler to calculate and because it's a real API thing. The only advantage to pushing the base thread ID is to avoid a single SHL in the shader. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6411defdcd	intel/cs: Re-run final NIR optimizations for each SIMD size With the advent of SPIR-V subgroup operations, compute shaders will have to be slightly different depending on the SIMD size at which they execute. In order to allow us to do dispatch-width specific things in NIR, we re-run the final NIR stages for each sIMD width. One side-effect of this change is that we start rallocing fs_visitors which means we need DECLARE_RALLOC_CXX_OPERATORS. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	4e79a77cdc	intel/compiler: Move the destructor from vec4_visitor to backend_shader Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	16ada419d7	i965/fs: Get rid of the early return in brw_compile_cs Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	80ddfab2f5	intel/cs: Rework the way thread local ID is handled Previously, brw_nir_lower_intrinsics added the param and then emitted a load_uniform intrinsic to load it directly. This commit switches things over to use a specific NIR intrinsic for the thread id. The one thing I don't like about this approach is that we have to copy thread_local_id over to the new visitor in import_uniforms. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	25f7453c9e	intel/fs: Mark 64-bit values as being contiguous This isn't often a problem , when we're in a compute shader, we must push the thread local ID so we decrement the amount of available push space by 1 and it's no longer even and 64-bit data can, in theory, span it. By marking those uniforms contiguous, we ensure that they never get split in half between push and pull constants. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	c4c8cba705	intel/cs: Ignore runtime_check_aads_emit for CS It's only set on gen4-5 which clearly don't support compute shaders. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	d4de813d86	intel/cs: Stop setting dispatch_grf_start_reg Nothing ever reads it for compute shaders because it's always 1. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b1a9cdede4	intel/cs: Drop max_dispatch_width checks from compile_cs The only things that adjust fs_visitor::max_dispatch_width are render target writes which don't happen in compute shaders so they're pointless. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00

... 255 256 257 258 259 ...

15202 commits