fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 17:48:15 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	803fad43c3	intel/nir: Add a memory barrier before barrier() Our barrier instruction does not implicitly do a memory fence but the GLSL barrier() intrinsic is supposed to. The easiest back-portable solution is to just add the NIR barriers. We'll sort this out more properly in later commits. Cc: mesa-stable@lists.freedesktop.org Closes: #2138 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2020-01-07 21:52:19 -06:00
Kenneth Graunke	defb3a9465	anv: Only enable EWA LOD algorithm when doing anisotropic filtering. Updated documentation renames "Anisotropic Algorithm" to "LOD Algorithm" and adds a note for Gen9+ saying "The EWA Algorithm should only be enabled for Anisotropic Filtering modes." and indicating that the extra accuracy shouldn't be necessary for other modes, and comes at a cost. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2020-01-04 14:27:22 -08:00
Jason Ekstrand	52ad1712ed	anv: Allow HiZ in TRANSFER_SRC_OPTIMAL on Gen8-9 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-04 12:25:54 -08:00
Jason Ekstrand	b274469daa	intel/blorp: Use the source format when using blorp_copy with HiZ Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-04 12:25:54 -08:00
Jason Ekstrand	95cc5438eb	blorp: Allow reading with HiZ Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-04 12:25:54 -08:00
Jason Ekstrand	4a1093005c	blorp: Stop whacking Z24 depth to BGRA8 The shader code required to do this is int(sat(x) * UINT24_MAX) which isn't really worth all the effort to avoid. Doing the format conversion, on the other hand, prevents us from sampling with HiZ which is something that we very much want on gen8-9 where we can. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-04 12:25:54 -08:00
Caio Marcelo de Oliveira Filho	75a19186b2	anv: Ignore some CreateInfo structs when rasterization is disabled According to the description of VkGraphicsPipelineCreateInfo(), pViewportState, pMultisampleState, pDepthStencilState and pColorBlendState must be ignored when rasterization is not enabled. This avoids potentially invalid pointers being dereferenced when rasterization is disabled. Tested with `demos_x64 VK_Parameter_Zoo` from Renderdoc repository. v2: Don't store the `raster_enabled` as part of anv_pipeline, just query it from the create info. This avoids storing a state that's only used during pipeline creation. (Jason) Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2258 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric@engestrom.ch> [v1] Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2020-01-03 13:57:31 -08:00
Caio Marcelo de Oliveira Filho	6755b6315b	anv: Drop unused function parameter Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2020-01-03 13:29:49 -08:00
Jason Ekstrand	9bd8000c6c	anv: Drop unneeded struct keywords All VkFoo structs are typedef'd to not need the struct keyword. Leaving it in there is just extra characters and breaks Vulkan's aliasing when stuff gets promoted to core versions. It's better to just never use struct for VkFoo. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2020-01-03 11:32:34 -06:00
Kenneth Graunke	d0d28c783d	iris: Set nir_shader_compiler_options::unify_interfaces. This is technically enabling the option in the common intel backend code, but only the st/nir linker uses the option, so it's iris-only. Fixes Piglit's spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out Closes: #2274 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>	2020-01-03 00:41:50 +00:00
Kenneth Graunke	7a9c0fc0d7	intel: Drop Gen11 WaBTPPrefetchDisable workaround This isn't needed on production Icelake hardware. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3250> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3250>	2020-01-03 00:20:17 +00:00
Jason Ekstrand	ac70442ce1	anv: Properly advertise sampledImageIntegerSampleCounts We support the same set of samples for integer color formats as for non-integer. We've been advertising it wrong since before the initial Vulkan 1.0 release. :-( Fixes: `d689745303` "vk/0.210.0: Rework device features and limits" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-24 08:31:44 -06:00
Ross Zwisler	cabcbb4db0	intel: limit shader geometry on BDW GT1 Similar to the SKL GT1 fix introduced here: `b1ba7ffdbd` we need to limit the .urb.max_entries[MESA_SHADER_GEOMETRY] on BDW GT1 to address failures in these two tests: dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_2d_array The value 690 was found via bisection. 691 is the actual max on the hardware I'm using, but 690 seemed like a nice round number. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3173> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3173>	2019-12-20 10:47:52 +00:00
Lionel Landwerlin	afdc0121b5	i965/iris/perf: factor out frequency register capture Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3113>	2019-12-18 14:23:17 +02:00
Caio Marcelo de Oliveira Filho	c61ad77cd2	anv/gen12: Temporarily disable VK_KHR_buffer_device_address (and EXT) For the sake of our testing infrastructure, disable this extension for TGL until we can sort out a hang in Vulkan CTS. Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-17 11:07:41 -08:00
Caio Marcelo de Oliveira Filho	766fdeccf9	intel/vec4: Fix lowering of multiplication by 16-bit constant Existing code was ignoring whether the type of the immediate source was signed or not. If the source was signed, it would ignore small negative values but it also would wrongly accept values between INT16_MAX and UINT16_MAX, causing the atual value to later be reinterpreted as a negative number (under 16-bits). Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older platforms that don't support MUL with 32x32 types and use vec4. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-17 10:45:22 -08:00
Caio Marcelo de Oliveira Filho	2137be22fa	intel/fs: Fix lowering of dword multiplication by 16-bit constant Existing code was ignoring whether the type of the immediate source was signed or not. If the source was signed, it would ignore small negative values but it also would wrongly accept values between INT16_MAX and UINT16_MAX, causing the atual value to later be reinterpreted as a negative number (under 16-bits). Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms that don't support MUL with 32x32 types, including ICL and TGL. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-17 10:45:22 -08:00
Iván Briano	a649bbffee	anv: Export VK_KHR_buffer_device_address only when really supported Fixes: `1b6991ba1d` ("anv: Implement VK_KHR_buffer_device_address") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>	2019-12-16 19:24:46 +00:00
Iván Briano	0fd93b9589	anv: Export filter_minmax support only when it's really supported Fixes: `bea4d4c78c` ("anv: add VK_EXT_sampler_filter_minmax support") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>	2019-12-16 19:24:46 +00:00
Lionel Landwerlin	c056193288	anv: drop unused parameter from apply layout pass Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-16 14:35:25 +02:00
Lionel Landwerlin	7c223cf316	anv: constify pipeline layout in nir passes Was hoping to find potential issues but nothing. Still probably a good idea. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-16 14:35:22 +02:00
Caio Marcelo de Oliveira Filho	c06ba83589	intel/fs: Lower 64-bit MOVs after lower_load_payload() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>	2019-12-14 21:12:21 +00:00
Eric Engestrom	c327245257	anv: drop unused #include Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-13 20:42:40 +00:00
Eric Engestrom	d600b19640	intel/compiler: replace `0` pointer with `NULL` Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-13 20:16:20 +00:00
Eric Engestrom	8074f68b3b	intel/compiler: add ASSERTED annotation to avoid "unused variable" warning Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-13 20:16:20 +00:00
Lionel Landwerlin	bd888bc1d6	i965/iris: perf-queries: don't invalidate/flush 3d pipeline Our current implementation of performance queries is fairly harsh because it completely flushes and invalidates the 3d pipeline caches at the beginning and end of each query. An argument can be made that this is how performance should be measured but it probably doesn't reflect what the application is actually doing and the actual cost of draw calls. A more appropriate approach is to just stall the pipeline at scoreboard, so that we measure the effect of a draw call without having the pipeline in a completely pristine state for every draw call. v2: Use end of pipe PIPE_CONTROL instruction for Iris (Ken) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-13 11:27:22 +02:00
Lionel Landwerlin	a575b3cd5c	intel/perf: drop batchbuffer flushing at query begin This was initially intended to fix issues with the query timings going occassionally high. It turns out there was a bug in the attribution of OA reports to our context when parsing the OA data. This led to reports flagged with other context IDs to be included in our queries results. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-12-13 11:27:17 +02:00
Lionel Landwerlin	2c5eb1df68	anv: fix assumptions about temporary fence payload Since `f9a3d9738b` temporary BO_WSI are definitely a thing so we have an assert wrong. Take that opportunity to expand a bit on an existing comment. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `f9a3d9738b` ("anv: Use BO fences/semaphores for AcquireNextImage") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ivan Briano <ivan.briano@intel.com>	2019-12-12 10:10:48 +00:00
Lionel Landwerlin	52bc235f2a	anv: fix fence underlying primitive checks We appear to have got lucky that the only type of temporary fence payload we could have was a syncobj and that would only happen when the type of the permanent payload was also a syncobj. This code was broken if that assumption changed and it did in commit `f9a3d9738b`. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ivan Briano <ivan.briano@intel.com>	2019-12-12 10:10:48 +00:00
Jason Ekstrand	776cfde699	anv: Bump the advertised patch version to 129 We've been keeping up with the spec updates. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 18:52:08 +00:00
Jason Ekstrand	5f5f5019bd	anv: Unconditionally advertise Vulkan 1.1 Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support to be actually usable. However, it doesn't strictly require that we support any external handle types. We should be able to advertise 1.1 even on old kernels that don't have syncobj support. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 18:52:08 +00:00
Jason Ekstrand	98a83d0fce	anv: Flush the queue on DeviceWaitIdle When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we don't, we only have BO waits and those aren't quite as nice. This commit adds a flag to _anv_queue_submit to wait for the queue to drain before returning. This gives us the behavior we need to implement DeviceWaitIdle. Fixes: `246261f0ad` "anv: prepare the driver for delayed submissions" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 18:52:08 +00:00
Eric Engestrom	b2dac806f8	intel: add mi_builder_test for gen12 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-11 15:38:19 +00:00
Kenneth Graunke	69d7782b15	intel/decoder: Make get_state_size take a full 64-bit address and a base i965 wants to use an offset from a base because everything is in a single buffer whose address may be relocated, and all base addresses are set to the start of that buffer. iris wants to use a full 64-bit address, because state lives in separate buffers which may be in the shader, surface, and dynamic memory zones, where addresses grow downward from the top of a 4GB zone, So it's very possible for a 32-bit offset to exist relative to multiple bases, leading to the wrong state size.	2019-12-10 19:10:49 -08:00
Kenneth Graunke	0f2f561a10	anv: Enable Gen11 Color/Z write merging optimization TCCNTLREG contains additional L3 cache write merging optimizations. The default value on my system appears to be: - URB Partial Write Merging (bit 0) - L3 Data Partial Write Merging (bit 2) - TC Disable (bit 3) Windows drivers appear to set bit 1 as well to enable "Color/Z Partial Write Merging". This should solve an issue we were seeing where MRT benchmarks were using substantially more bandwidth than they ought. However, we have not observed it to cause measurable FPS gains. It is unclear whether we should be setting bit 0 or bit 3, so for now we leave those at the hardware default value. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-10 16:19:46 -08:00
Kenneth Graunke	0b74f85870	intel/genxml: Add a partial TCCNTLREG definition TCCNTLREG contains additional cache programming settings. In particular, there are several write combining controls we'd like to use. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-10 16:19:33 -08:00
Jason Ekstrand	41691ac016	ANV: Stop advertising smoothLines support on gen10+ Reviewed-by: Ivan Briano <ivan.briano@intel.com>	2019-12-10 20:13:56 +00:00
Lionel Landwerlin	5fdea9f401	anv: fix incorrect VMA alignment for CCS main surfaces Maybe finer way of dealing with this requirement would be to increase the number of pdevice->memory.types[] to add a category for special alignment cases. Meanwhile this fixes the problem of CCS surface alignment and it's probably not going to cause issues given the size of our address space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `6af8a4acc4` ("anv: Add aux-map translation for gen12+") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-10 16:06:54 +00:00
Lionel Landwerlin	dcfe1903c3	anv: fix missing gen12 handling Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `181be14d43` ("anv: Build for gen12") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-10 16:06:54 +00:00
Anuj Phogat	1a32fbd48c	intel: Add pci-ids for Jasper Lake Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-09 12:22:57 -08:00
Anuj Phogat	11fdd5f52c	intel: Add device info for 1x4x6 Jasper Lake Also removing the FIXME comments after matching the numbers with updated documentation. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-09 12:22:56 -08:00
Jason Ekstrand	0f60aa4037	anv: Re-emit all compute state on pipeline switch It's a very odd case to hit in the real world. However, there are some CTS tests which switch back and forth between dispatch and clear without changing the pipeline. Fixes: `bc612536eb` "anv: Emit a dummy MEDIA_VFE_STATE before switching..." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-07 04:03:35 +00:00
Jason Ekstrand	bce1c3c668	anv: Re-capture all batch and state buffers When we moved from allocating BOs directly to using the BO cache, we lost the EXEC_OBJECT_CAPTURE flag on all our state buffers. Fixes: `3119b96bdf` "anv: Allocate block pool BOs from the cache" Fixes: `ee77938733` "anv: Allocate batch and fence buffers from..." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-07 04:03:35 +00:00
Jason Ekstrand	865ffe4e02	anv: Return VK_ERROR_OUT_OF_DEVICE_MEMORY for too-large buffers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-06 22:32:05 +00:00
Jason Ekstrand	f9a3d9738b	anv: Use BO fences/semaphores for AcquireNextImage Instead of doing a dummy submit on the command buffer for the fence or a dummy semaphore and trusting in implicit sync, this commit moves us to take advantage of implicit sync and just use the WSI image BO as the fence. Both semaphores and fences require a tiny bit of extra plumbing to do this but the result is that we can get rid of a bunch of the extra synchronization we're doing today. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-06 19:58:07 +00:00
Jason Ekstrand	ecc119a96e	anv: Add a fence_reset_reset_temporary helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-06 19:58:07 +00:00
Jason Ekstrand	ccb7d606f1	anv: Use submit-time implicit sync instead of allocate-time In `83b943cc2f`, we started making all VkDeviceMemory BOs resident all the time. One unfortunate side-effect of this is that every vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which means that X server or Wayland compositor, instead of waiting on the last vkQueueSubmit to actually write the buffer, now waits on the last vkQueueSubmit to from that driver instance relative to whenever the compositor's GL driver instance calls execbuf. This potentially leads to a lot of extra synchronization that we didn't intend to have. Instead, this commit makes it so that we leave WSI memory objects with EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of vkQueuePresent. This should hopefully result in tighter integration with the compositor, lower latency, and better performance. Testing with DOOM 2016, this seems to reduce latency by at least a frame if not two and makes the game much more responsive. Testing was, however, subjective, so we don't have any hard data on that. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-06 19:58:07 +00:00
Jason Ekstrand	6ebf677cfd	anv: Always add in EXEC_OBJECT_WRITE when specified in extra_flags Otherwise, we're trusting in the execbuf_add_bo which sets EXEC_OBJECT_WRITE to to always be the first one that gets called. This is likely true for fences but it seems somewhat fragile. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-06 19:58:07 +00:00
Jason Ekstrand	1b6991ba1d	anv: Implement VK_KHR_buffer_device_address The primary difference between the KHR and EXT versions of the extension is that the KHR provides the address at AllocateMemory time for replay so we can replay it safely without moving to a sparse address model. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-05 10:59:10 -06:00
Jason Ekstrand	4428cd9127	anv: Use a pNext loop in AllocateMemory This function has a lot of possible extensions and some of them we can easily handle on-the-fly so it's easier to just have a loop than to find each structure manually. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-12-05 10:59:10 -06:00

1 2 3 4 5 ...

4997 commits