fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-06 05:08:08 +02:00

Author	SHA1	Message	Date
Eric Anholt	f2ea936f48	v3d: Skip emitting texture config parameter 2 if it's just the defaults. shader-db: total instructions in shared programs: 91275 -> 90768 (-0.56%) instructions in affected programs: 20702 -> 20195 (-2.45%)	2018-07-23 10:21:43 -07:00
Eric Anholt	421e99d777	v3d: Update an XXX comment for a path we handled in HW on V3D 4.x.	2018-07-23 10:21:43 -07:00
Eric Anholt	e7ae900341	v3d: Switch to using the new SFU instructions on V3D 4.x. These instructions let us write directly to the phys regfile, instead of just R4. That lets us avoid moving out of R4 to avoid conflicting with other SFU results, and to avoid conflicting with thread switches. There is still an extra instruction of latency, which is not represented in the scheduler at the moment. If you use the result before it's ready, the QPU will just stall, unlike the magic R4 mode where you'd read the previous value. That means that the following shader-db results aren't quite representative (since we now cause some stalls instead of emitting nops), but they're impressive enough that I'm happy with the change. total instructions in shared programs: 95669 -> 91275 (-4.59%) instructions in affected programs: 82590 -> 78196 (-5.32%)	2018-07-23 10:21:43 -07:00
Eric Anholt	58c1d3860f	v3d: Add QPU pack/unpack for the new SFU instructions. These instructions allow writing the result to any register, instead of a special writeback to r4.	2018-07-23 10:21:43 -07:00
Eric Anholt	cdfa99657d	v3d: Fix the name of the "flpop" operation. Noticed while trying to sort a new op into the appropriate place to match the documentation.	2018-07-23 10:21:43 -07:00
Eric Anholt	91e24e5718	v3d: Print the instruction we're testing in the QPU disasm/pack round-trip. If we fail initial disassembly, it's good to know what instruction it was that failed.	2018-07-23 10:21:42 -07:00
Eric Anholt	a1beb333d8	v3d: Drop unused vir_SAT() operation. We lower saturates in NIR.	2018-07-23 10:21:42 -07:00
Eric Anholt	8dfc6ee317	v3d: Rotate through registers to improve post-RA scheduling options. Similarly to VC4's implementation, by not picking r0 immediately upon freeing it, we give the scheduler more of a chance to fit later writes in earlier. I'm not clear on whether there's any real cost to picking phys over accumulators, so keep that behavior for now. shader-db: total instructions in shared programs: 96831 -> 95669 (-1.20%) instructions in affected programs: 77254 -> 76092 (-1.50%)	2018-07-23 10:21:42 -07:00
Eric Anholt	1fb31819ae	v3d: Allow reading from physical regs written in the previous instruction. This restriction existed in V3D 2.x, but lifting it was a major change in 3.x. shader-db results: total instructions in shared programs: 98117 -> 96831 (-1.31%) instructions in affected programs: 48520 -> 47234 (-2.65%)	2018-07-23 10:21:23 -07:00
Eric Engestrom	e6e22e4207	anv: remove unnecessary runtime copy of static string It's actually also a bit safer, since now the compiler will warn if the string is larger than the `.name` array. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-23 17:56:08 +01:00
Alex Smith	54f8f1545f	anv: Pay attention to VK_ACCESS_MEMORY_(READ\|WRITE)_BIT According to the spec, these should apply to all read/write access types (so would be equivalent to specifying all other access types individually). Currently, they were doing nothing. v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-23 15:29:43 +01:00
Erik Faye-Lund	dc938b8398	virgl: remove unused stride-arguments The IOCTLs doesn't pass this along, so computing them in the first place is kinda pointless. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-07-23 11:21:09 +01:00
Samuel Pitoiset	6c58bc8d9c	radv: print a big warning when RADV_TRACE_FILE is set Users shouldn't use this debugging option except when we ask them to do! Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-23 11:34:42 +02:00
Samuel Pitoiset	6e32d9e7b0	radv: fix a memleak for merged shaders on GFX9 modules[i] can be NULL for merged shaders but we have to free the NIR code. radv_can_dump_shader_stats() already handles if modules[i] is NULL, no need to check it twice. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-23 11:34:39 +02:00
Jason Ekstrand	d0ee0a0a5d	intel/blorp: Fix blits to R8G8B8_UNORM_SRGB sRGB harder The first fix attempt contained a nasty typo which somehow didn't get caught in review. It also didn't work as intended because the sRGB conversion was happening but then throwing away all but the red channel because it dind't know it was RGB. Really, it's my fault for trying to fix a bug without first writing tests. I've now written tests and they pass with this change. :) Fixes: `11712b9ca1` "intel/blorp: Fix blits to R8G8B8_UNORM_SRGB" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-07-23 00:36:39 -07:00
Jason Ekstrand	abd629eb3d	anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV We've had several broadwell hangs that have come down to this bit just not working correctly. Most recently, we've had a pile of hangs reported with apps running under DXVK: https://github.com/doitsujin/dxvk/issues/469 Instead, use the bit that doesn't try to imply weird D3D coherency things and just force-enables the PS like we want. cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-22 23:43:19 -07:00
Jason Ekstrand	b99493c628	anv: Properly handle GetImageSubresourceLayout on complex images We support mipmapped and arrayed linear images so we need to support vkGetImageSubresourceLayout on them. Fortunately, it's just a trivial call into ISL. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-07-22 23:24:10 -07:00
Timothy Arceri	78f391d343	radeonsi/nir: make use of nir_lower_load_const_to_scalar() This allows NIR to CSE more operations. LLVM does this also so the impact is limited, however doing this in NIR allows other opts to make progress. For example some loops in Civilization Beyond Earth shaders are unrolled. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-23 09:48:51 +10:00
Ilia Mirkin	257128079c	anv/gen9: expose VK_EXT_post_depth_coverage Note that the use of ICMS_INNER_CONSERVATIVE disagrees with the GL driver. Perhaps it's more performant than ICMS_NORMAL and is otherwise permitted? Not sure, so I left it as-is. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-22 14:56:44 -07:00
Ilia Mirkin	768f143667	spirv: add support for SPV_KHR_post_depth_coverage Allow the capability to be exposed, and convert the new execution mode into fs state. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-22 14:56:36 -07:00
Mauro Rossi	6cbbd5b4f8	android: util/disk_cache: fix building errors in gallium drivers This patch applies the necessary changes in Android.common.mk as per automake rules, to avoid following building error: external/mesa/src/gallium/drivers/nouveau/nouveau_screen.c:159:8: error: implicit declaration of function 'disk_cache_get_function_timestamp' is invalid in C99 [-Werror,-Wimplicit-function-declaration] if (disk_cache_get_function_timestamp(nouveau_disk_cache_create, ^ 1 error generated. (v2) -DENABLE_SHADER_CACHE Android cflag is kept, to leave the AS-IS capability enabled Fixes: `cc10b34` ("util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-21 12:06:38 +02:00
Chih-Wei Huang	e7ffd3fb08	Android: fix a missing nir_intrinsics.h error The commit `76dfed8ae2` changed nir_intrinsics.h to be a generated header, but the corresponding dependency was not updated for Android. It causes the error: [ 0% 19/4336] target C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c ... In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25: In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28: In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140: In file included from external/mesa/src/amd/common/ac_llvm_build.h:30: external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found ^~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `76dfed8ae2` ("nir: mako all the intrinsics") Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>	2018-07-21 08:50:23 +02:00
Bas Nieuwenhuizen	e1febbefe8	nir: Fix end of function without return warning/error. There always is a continue block, so let us just do unreachable. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `8cacf38f52` "nir: Do not use continue block after removing it." CC: 18.1 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107312	2018-07-20 22:27:39 +02:00
Danylo Piliaiev	d24c35c3fb	st: Sweep NIR after linking phase to free held memory After optimization passes and many trasfromations most of memory NIR holds is a garbage which was being freed only after shader deletion. Freeing it at the end of linking will save memory which would be useful in case there are a lot of complex shaders being compiled. The common case for this issue is 32bit game running under Wine. The cost of the optimization is around ~3-5% of compilation speed with complex shaders. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-07-20 11:26:12 -07:00
Eric Anholt	945524ba0e	st/dri: Don't require a dri_format for image creation. Nothing in EGL_KHR_gl_image.txt seems to let us deny creation based on formats, and doing so causes many failures in dEQP-EGL.functional.image.api.* The NONE value we were protecting from only gets looked at in the __DRI_IMAGE_ATTRIB_FORMAT and __DRI_IMAGE_ATTRIB_FOURCC queries, which are used from wayland and gbm (which throw an error cleanly on unknown format) and DMABUF export. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-20 11:26:12 -07:00
Eric Anholt	f6750456c5	egl: Refuse EGL_MESA_image_dma_buf_export if we don't have a DRM fourcc. The EGL CTS expects that you can make images from all sorts of things, including things like z16 and s8, which we don't have DRM fourccs for. Just return an error when trying to export one of those. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-20 11:26:12 -07:00
Eric Anholt	a221f9709e	v3d: Fix incorrect handling of two fences created back-to-back. Recreating our context's syncobj with ALREADY_SIGNALED meant that if you created two fences in a row, then waiting on the second would succeed immediately. Instead, export a sync file in the gallium fence (since we don't have a syncobj clone ioctl), and just create a new syncobj to wait on whenever we need to. Noticed while debugging dEQP-GLES3.functional.fence_sync.client_wait_sync_finish	2018-07-20 11:11:29 -07:00
Eric Anholt	fc28692a5a	v3d: Fix the timeout value passed to drmSyncobjWait(). The API wants an absolute time, so we need to go add gallium's argument to CLOCK_MONOTONIC.	2018-07-20 11:11:29 -07:00
Eric Anholt	4f04bd68cf	v3d: Fix drmSyncobjWait() return value checking even more. It tends to return >0 in the success case (I think the value is something like "how much of the timeout remained"). Fixes dEQP-GLES3.functional.fence_sync.client_wait_sync_finish	2018-07-20 11:11:29 -07:00
Eric Anholt	2f90879a34	v3d: Use the list_first_entry/list_last_entry macros.	2018-07-20 11:11:29 -07:00
Eric Anholt	d0e53373e5	v3d: Move BO cache counting to dump time instead of cache management. This is one less way to get the dump stats wrong.	2018-07-20 11:11:29 -07:00
Eric Anholt	7d6aef6fa5	v3d: Reduce the stale BO reclamation spam with dump_stats set. This was obviously meant to be when we were actually freeing a BO, not just when there was at least one BO in the list.	2018-07-20 11:11:29 -07:00
Eric Anholt	5d11094db1	v3d: Respect a sampler view's first_layer field. Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture	2018-07-20 11:11:29 -07:00
Sonny Jiang	c6737756ad	radeonsi: emit_spi_map packets optimization v2: marek: remove an empty line before break; rename reg_val_seq -> spi_ps_input_cntl "type * x" -> "type *x" Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-07-20 13:50:26 -04:00
Gert Wollny	4d094993c3	virgl: Expose GL_ARB_copy_image if host supports it Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-07-20 19:15:12 +02:00
Gert Wollny	0bde9739c0	virgl: Allow RGB32* textures only as buffer objects When requesting a texture of the internal format GL_RGB32F Gallium will try to allocate a renderable texture and returns RGBA32F or RGBX32F, but when one requests GL_RGB32I or GL_RGB32UI the according 3-component texture will be returned. This leads to problems later, when one wants to use glCopyImageSubData to copy data between these textures that should be compatible, but given the way virgl and Gallium handle this the latter fails with an assertion, because the per-texel bit size is different. By allowing the GL_RGB32* only for texture buffers these problems are avoided without losing the ARB_tbo_rgb32 extension (thanks Ilia Mirkin). v2: Correct spelling (Gurchetan Singh) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-07-20 19:12:49 +02:00
Lionel Landwerlin	feb43ef674	intel: tools: dump: protect against multiple calls on destructor When running gdb, make sure to pass the LD_PRELOAD variable only to the executed program, not the debugger. Otherwise the debugger will run the preloaded constructor/destructor too and bad things will happen. Suggested-by: Rafael Antognolli <rafael.antognolli@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-20 17:36:56 +01:00
Lionel Landwerlin	2a9069eb97	intel: tools: dump: make dump tool reliable under gdb The problem with passing the configuration of the dump lib through a file descriptor is that it can be read only once. But under gdb you might want to rerun your program multiple times. This change hands the configuration through a temporary file that is deleted once the command line passes to intel_dump_gpu has exited. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-20 17:36:37 +01:00
Samuel Pitoiset	1efc9094e0	radv: don't flush DB before subpass FS resolves That shouldn't be needed because the DB state is invalid. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-20 17:30:13 +02:00
Gert Wollny	016807161b	r600: Correct evaluation of cube array index and face The array index needs to be corrected and it must be insured that it is rounded and its value is non-negative before it is combined with the face id. v5: Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin) v6: Fix type (Roland Scheidegger) Fixes 182 from android/cts/master/gles31-master.txt: dEQP-GLES31.functional.texture.filtering.cube_array.formats.* dEQP-GLES31.functional.texture.filtering.cube_array.sizes.* dEQP-GLES31.functional.texture.filtering.cube_array.combinations.nearest_mipmap_* dEQP-GLES31.functional.texture.filtering.cube_array.combinations.linear_mipmap_* dEQP-GLES31.functional.texture.filtering.cube_array.no_edges_visible.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-07-20 14:55:12 +02:00
Gert Wollny	01766c1db6	r600: correct texture offset for array index lookup Correct the array index for TEXTURE_1D_ARRAY, and TEXTURE_2D_ARRAY The standard says the array index is evaluated according to floor(z + 0.5) but RNDNE is sufficient also for the test cases were z is close to 1.5 and it is likely to hit 1.5, the corner case were RNDNE gives a result different from above formula. v5: - Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin) - update commit message Fixes 325 tests from android/cts/master/gles3-master.txt: dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray dEQP-GLES3.functional.shaders.texture_functions.textureoffset.sampler2darray dEQP-GLES3.functional.shaders.texture_functions.texturelod.sampler2darray* dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.sampler2darray dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler2darray dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2darray dEQP-GLES3.functional.texture.filtering.2d_array.formats.* dEQP-GLES3.functional.texture.filtering.2d_array.sizes.* dEQP-GLES3.functional.texture.filtering.2d_array.combinations.* dEQP-GLES3.functional.texture.shadow.2d_array.* dEQP-GLES3.functional.texture.vertex.2d_array.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-07-20 14:55:12 +02:00
Gert Wollny	626bd455d4	r600: Delay emission of texture gradients and lookup offsets Gradients used in texture lookups and the offsets must reside in the same fetch clause (the first is imposed by the hardware and the second is expected by sb). In order to ensure that no ALU clause is inserted between emission and use of these, delay the emission of these instructions until the texture instruction using them is also emitted. This is needed in preparation for the correction of the texture array indices. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-07-20 14:55:12 +02:00
Bas Nieuwenhuizen	cc10b34e9e	util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache. radv always needs it, so just check the header instead. Also do not declare the function if the variable is not set, so we get a nice compile error instead of failing to open a device at runtime. Fixes: `b87ef9e606` "util: fix MSVC build issue in disk_cache.h" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-20 12:09:19 +02:00
Bas Nieuwenhuizen	8cacf38f52	nir: Do not use continue block after removing it. Reinserting code directly before a jump means the block gets split and merged, removing the original block and replacing it in the process. Hence keeping a pointer to the continue block over a reinsert causes issues. This code changes nir_opt_if to simply look for the new continue block. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107275 CC: 18.1 <mesa-stable@lists.freedesktop.org>	2018-07-20 12:09:19 +02:00
Samuel Pitoiset	ce454d02cc	radv: simplify a condition in radv_src_access_flush() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-20 10:17:17 +02:00
Samuel Pitoiset	1ff25c4e6b	radv: save current state just before resolving with FS Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-20 10:17:15 +02:00
Samuel Pitoiset	c3d5f124c6	radv: don't check if a subpass has resolve attachments twice We already check that in radv_cmd_buffer_resolve_subpass(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-20 10:17:13 +02:00
Samuel Pitoiset	0a8127bbfb	radv: make use of radv_subpass_barrier() when resolving subpasses The goal is to use radv_barrier()/radv_subpass_barrier() as much as possible for further optimizations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-20 10:17:11 +02:00
Rhys Perry	409a60df3b	nv50/ir: move LateAlgebraicOpt back to right after ConstantFolding total instructions in shared programs : 5480808 -> 5472107 (-0.16%) total gprs used in shared programs : 647530 -> 647532 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58551648 -> 58459352 (-0.16%) local shared gpr inst bytes helped 0 0 73 2609 2609 hurt 0 0 71 34 34	2018-07-19 23:34:58 +02:00
Rhys Perry	2afef231db	nv50/ir: handle SHLADD in IndirectPropagation An alternative solution to the problem fixed in `0bd83d0` ("nv50/ir: move LateAlgebraicOpt to the very end"). total instructions in shared programs : 5481195 -> 5480808 (-0.01%) total gprs used in shared programs : 647535 -> 647530 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58555784 -> 58551648 (-0.01%) local shared gpr inst bytes helped 0 0 2 34 34 hurt 0 0 0 0 0	2018-07-19 23:34:58 +02:00

1 2 3 4 5 ...

103686 commits