fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-28 05:20:23 +01:00

Author	SHA1	Message	Date
Marek Olšák	1807f6cfe9	winsys/amdgpu: enable chaining for compute IBs Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:36:06 -04:00
Marek Olšák	b99bed6246	winsys/amdgpu: reorder chunks, make BO_HANDLES first, IB and FENCE last Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	437d032b7d	winsys/amdgpu: make IBs writable and expose their address Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	2313176817	ac: add REWIND and GDS registers to register headers Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	35cd57df2e	ac: add ac_get_i1_sgpr_mask Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	bfb9287599	ac: add radeon_info::is_pro_graphics Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	64d6cc982d	ac: add radeon_info::marketing_name, replacing the winsys callback Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	9b33465481	tgsi/scan: add uses_drawid Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Kenneth Graunke	77449d7c41	iris: Track valid data range and infer unsynchronized mappings. Applications frequently call glBufferSubData() to consecutive regions of a VBO to append new vertex data. If no data exists there yet, we can promote these to unsynchronized writes, even if the buffer is busy, since the GPU can't be doing anything useful with undefined content. This can avoid a bunch of unnecessary blitting on the GPU. u_threaded_context would do this for us, and in fact prohibits us from doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't hooked that up yet, and it may be useful to disable u_threaded_context when debugging...at which point we'd still want this optimization. At the very least, it would let us measure the benefit of threading independently from this optimization. And it's not a lot of code. Removes most stall avoidance blits in "Total War: WARHAMMER." On my Skylake GT4e at 1920x1080, this appears to improve performance in games by the following (but I did not do many runs for proper statistics gathering): ---------------------------------------------- \| DiRT Rally \| +2% (avg) \| + 2% (max) \| \| Bioshock Infinite \| +3% (avg) \| + 9% (max) \| \| Shadow of Mordor \| +7% (avg) \| +20% (max) \| ----------------------------------------------	2019-04-23 00:24:08 -07:00
Kenneth Graunke	768b17a7ad	iris: Make a resource_is_busy() helper This checks both "is it busy" and "do we have work queued up for it"?	2019-04-23 00:24:08 -07:00
Kenneth Graunke	5ad0c88dbe	iris: Replace buffer backing storage and rebind to update addresses. This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(), as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either of these happen, we swap out the backing storage of the buffer for a new idle BO, allowing us to write to it immediately without stalling or queueing a blit. On my Skylake GT4e at 1920x1080, this improves performance in games: ----------------------------------------------- \| DiRT Rally \| +25% (avg) \| +17% (max) \| \| Bioshock Infinite \| +22% (avg) \| +11% (max) \| \| Shadow of Mordor \| +27% (avg) \| +83% (max) \| -----------------------------------------------	2019-04-23 00:24:08 -07:00
Kenneth Graunke	0a082b6560	iris: Make memzone_for_address non-static I want to use this in iris_resource.c.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	72277044e2	iris: Make a gl_shader_stage -> pipe_shader_stage helper function This is probably not the best place for it, but I don't feel like moving the one out of the TGSI translator today, and we already have the other direction here, so...shrug	2019-04-23 00:24:08 -07:00
Kenneth Graunke	b45dff1da8	iris: Rework image views to store pipe_image_view. This will be useful when rebinding images.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	2f60850a3f	iris: Rework UBOs and SSBOs to use pipe_shader_buffer This unifies a bunch of the UBO and SSBO code to use common structures. Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size, which can be useful when filling out the surface state.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	00d4019676	iris: Track bound constant buffers This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS) looking to see if any resources are bound.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	4d12236072	iris: Mark constants dirty on transfer unmap even if no flushes occur I have various conditions in place to try and avoid unnecessary PIPE_CONTROL flushes, especially to batches which may have never used the buffer being mapped. But if we do a CPU map to a bound constant buffer, we still need to mark push constants dirty, even if there's nothing happening in batches that would warrant a flush. Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus (lots of rainbow colored triangles). Fixes lots of blinking elements in "Shadow of Mordor". Fixes missing crowd rendering in "DiRT Rally".	2019-04-23 00:24:08 -07:00
Lionel Landwerlin	b1ba7ffdbd	intel: workaround VS fixed function issue on Gen9 GT1 parts The issue is noticeable in the dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d test where a triangle goes missing when we use the maximum number of URB entries as specified by the documentation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505 Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 13:41:20 +08:00
Matt Turner	4ec258ac3c	intel/compiler: Improve fix_3src_operand() Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will be handled by opt_combine_constants(). Both are already allowed by opt_copy_propagation(). Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think those occur. This is sufficient to allow a pass added in a future commit (fs_visitor::lower_linterp) to avoid emitting extra MOV instructions. I removed the 'src.stride > 1' case because it seems wrong: 3-src instructions on Gen6-9 are align16-only and can only do stride=1 or stride=0. A run through Jenkins with an assert(src.stride <= 1) never triggers, so it seems that it was dead code. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-22 16:54:31 -07:00
Matt Turner	8aae7a3998	intel/compiler: Add unit tests for sat prop for different exec sizes The two new unit tests verify that propagating a saturate between instructions of different exec sizes does not happen. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-22 16:54:21 -07:00
Matt Turner	54d4d34b96	intel/compiler: Use SIMD16 instructions in fs saturate prop unit test Will allow us to test that propagation between instructions of different exec sizes does not happen (in the next commit). The stray-looking change in intervening_dest_write is to adjust the size of the texture result to keep the test functioning identically when the instructions' exec sizes are doubled. Without the change, the texture does not overwrite the destination fully as the unit test intends. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-22 16:54:17 -07:00
Rafael Antognolli	70e03e220c	intel/fs: Remove fs_generator::generate_linterp from gen11+. We now have a lowering pass that will do this at the fs_visitor level, so we can remove this code from gen11+. v2: Reduce size of the "i" array from 4 to 2 (Matt). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Rafael Antognolli	9ea90aae1e	intel/fs: Add a lowering pass for linear interpolation. On gen11, instead of using a PLN instruction, we convert FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the fs_generator code. This patch adds a lowering pass that does the same thing at the fs_visitor. It also drops the usage of NF types, since we don't need the extra precision and it lets us skip the accumulator. With all that, some optimizations will still be run on the generated code, and we should get better scheduling. v2: Update comment about saturation and conditional mod (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Rafael Antognolli	c0504569ea	intel/fs: Move the scalar-region conversion to the generator. Move the scalar-region conversion from the IR to the generator, so it doesn't affect the Gen11 path. We need the non-scalar regioning for a later lowering pass that we are adding. v2: Better commit message (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Rafael Antognolli	0778748eba	intel/fs: Only propagate saturation if exec_size is the same. Otherwise it could propagate the saturation from a SIMD16 instruction into a SIMD8 instruction. With that, only part of the destination register, which is the source of the move with saturation, would have been updated. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:53:55 -07:00
Kenneth Graunke	087f92c59a	i965: Tidy bogus indentation left by previous commit I left code indented one level too far in the previous commit to make the diff easier to review. Drop that extra level now. Fixes: `6981069fc8` i965: Ignore uniform storage for samplers or images, use binding info Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-22 15:41:56 -07:00
Kenneth Graunke	6981069fc8	i965: Ignore uniform storage for samplers or images, use binding info gl_nir_lower_samplers_as_deref creates new top level sampler and image uniforms which have been split from structure uniforms. i965 assumed that it could walk through gl_uniform_storage slots by starting at var->data.location and walking forward based on a simple slot count. This assumed that structure types were walked in a particular order. With samplers and images split out of structures, it becomes impossible to assign meaningful locations. Consider: struct S { sampler2D a; sampler2D b; } s[2]; The gl_uniform_storage locations for these follow this map: 0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0]. But the new split variables look like: sampler2D lowered_a[2]; sampler2D lowered_b[2]; and there is no way to know that there's effectively a stride to get to the location for successive elements of a[] or b[]. So, working with location becomes effectively impossible. Ultimately, the point of looking at uniform storage was to pull out the bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs can obtain this information while doing the splitting, however, and sets up var->data.binding to have the desired values. We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so gl_nir_lower_samplers_as_derefs has the opportunity to set proper image bindings. Then, we make the uniform handling code skip sampler(-array) variables, and handle image param setup based on var->data.binding. Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct, this time without regressing dEQP-GLES2.functional.uniform_api.random.3. Fixes: `f003859f97` nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-22 15:39:55 -07:00
Kenneth Graunke	47303b466c	Revert "glsl: Set location on structure-split sampler uniform variables" This reverts commit `9e0c744f07`, which regressed dEQP-GLES2.functional.uniform_api.random.3. It turns out that the newly produced location is meaningless and impossible to consume by drivers that want to look at gl_uniform_storage, so it's probably better to leave it unset (0) than a number that looks usable. Leave a tombstone^Wcomment to discourage the next person from making the obvious looking fix. See the next commit for a longer description of the problem. This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct on i965, which was originally fixed by the revert. The next commit will fix it again. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-22 15:39:55 -07:00
Marek Olšák	b58e5fb6f3	radeonsi: use CP DMA for the null const buffer clear on CIK This is a workaround for a thread deadlock that I have no idea why it occurs. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879 Fixes: `9b331e462e` Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-22 16:05:52 -04:00
Danylo Piliaiev	f280c36c08	drirc: Add workaround for Epic Games Launcher Epic Games Launcher could be launched in opengl mode with "-opengl" option. It creates 4.4 opengl core context however it uses deprecated functionality e.g. default vertex buffer object. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110462 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-22 16:04:19 -04:00
Kenneth Graunke	1566054459	iris: Track bound and writable SSBOs Marek recently extended pipe->set_shader_buffers() to take an extra writable_bitmask parameter, indicating which SSBOs are writable (some may be bound read-only). We can use this to decide whether to set EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us some cross-batch flushing if the SSBO is used for reading in both the render and compute engines.	2019-04-22 11:31:14 -07:00
Chia-I Wu	e9c5e13344	virgl: clear vertex_array_dirty Clear vertex_array_dirty after the state is emitted. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-22 10:19:47 -07:00
Lubomir Rintel	e983a975c6	gallivm: disable NEON instructions if they are not supported The LLVM project made some questionable decisions about defaults for armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell platforms). On top of that, getHostCPUFeatures() doesn't disable missing machine attributes. Finally, -neon alone is not sufficient to disable emmision of NEON instructions. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 09:47:49 -07:00
Lubomir Rintel	bc6bfc861f	gallivm: guess CPU features also on ARM getHostCPUFeatures() is also available on ARM, for even longer time than for x86. Use it -- it potentially enables instructions that may speed things up. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: <mesa-stable@lists.freedesktop.org> Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 09:47:39 -07:00
Kenneth Graunke	36478b9f77	iris: Enable the dual_color_blend_by_location driconf option. This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.	2019-04-22 09:36:36 -07:00
Kenneth Graunke	faa52e328e	iris: Add mechanism for iris-specific driconf options Based on Nicolai's `0f8c5de869`. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-22 09:35:36 -07:00
Jason Ekstrand	ccb25aaeaf	nir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref We have a macro for this now; no reason to hand-roll it for derefs. While we're here, move the NIR_DEFINE_CAST for derefs down to where all the other ones are. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-22 15:23:24 +00:00
Jason Ekstrand	2314db10bf	anv,radv: Update release notes for newly implemented extensiosn A lot has happened in those two drivers since the 19.0 release and we keep forgetting to update release notes. Time to bring everything up to date again before 19.1 gets released. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-22 14:47:23 +00:00
Samuel Pitoiset	b3e3440c87	radv: add VK_NV_compute_shader_derivates support Only computeDerivativeGroupLinear is supported for now. All crucible tests pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-22 14:51:57 +02:00
Ian Romanick	a6ccc4c0c8	intel/fs: Add support for float16 to the fsign optimizations Commit `ad98fbc217` ("intel/fs: Refactor code generation for nir_op_fsign to its own function") criss-crossed with `c2b8fb9a81` ("anv/device: expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying enough attention when I rebased. This adds back the float16 changes and enables the optimization. v2: Incorporate more changes from `19cd2f5deb` and `a8d8b1a139` that I missed in the previous version. Fixes: `ad98fbc217` ("intel/fs: Refactor code generation for nir_op_fsign to its own function") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]	2019-04-20 20:49:34 -07:00
Icenowy Zheng	3e91c7d544	lima: add Android build Currently only meson build supported is added for lima driver. Add Android build support for lima. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-04-21 01:05:19 +00:00
Andre Heider	8b13aac966	st/nine: skip position checks in SetCursorPosition() For HW cursors, "cursor.pos" doesn't hold the current position of the pointer, just the position of the last call to SetCursorPosition(). Skip the check against stale values and bump the d3dadapter9 drm version to expose this change of behaviour. Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2019-04-20 13:06:29 +02:00
Jason Ekstrand	828ec41154	anv: Rework the descriptor set layout create loop Previously, we were storing the per-binding create info pointer in the immutable_samplers field temporarily so that we can switch the order in which we walk the loop. However, now that we have multiple arrays of structs to walk, it makes more sense to store an index of some sort. Because we want to leave immutable_samplers as NULL for undefined bindings, we store index + 1 and then subtract one later. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 23:26:41 +00:00
Jason Ekstrand	2b388c3d04	anv: Ignore descriptor binding flags if bindingCount == 0 I missed this on the first go round. The bindingCount field of VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero which means the flags array is ignored. Fixes: `d6c9bd6e01` "anv: Put binding flags in descriptor set layouts" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 23:26:41 +00:00
Alyssa Rosenzweig	648cda258b	panfrost/mdg: Use shared fsign lowering Fixes failures in shaders.operator.common_functions.sign.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-19 23:15:57 +00:00
Alyssa Rosenzweig	31d9caa239	panfrost: Fixup vertex offsets to prevent shadow copy Mali attribute buffers have to be 64-byte aligned. However, Gallium enforces no such requirement; for unaligned buffers, we were previously forced to create a shadow copy (slow!). To prevent this, we instead use the offseted buffer's address with the lower bits masked off, and then add those masked off bits to the src_offset. Proof of correctness included, possibly for the opportunity to say "QED" unironically. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-19 22:50:20 +00:00
Alyssa Rosenzweig	e008d4f011	panfrost: Track BO lifetime with jobs and reference counts This (fairly large) patch continues work surrounding the panfrost_job abstraction to improve job lifetime management. In particular, we add infrastructure to track which BOs are used by a particular job (currently limited to the vertex buffer BOs), to reference count these BOs, and to automatically manage the BOs memory based on the reference count. This set of changes serves as a code cleanup, as a way of future proofing for allowing flushing BOs, and immediately as a bugfix to workaround the missing reference counting for vertex buffer BOs. Meanwhile, there are a few cleanups to vertex buffer handling code itself, so in the short-term, this allows us to remove the costly VBO staging workaround, since this patch addresses the underlying causes. v2: Use pipe_reference for BO reference counting, rather than managing it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer counting. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-19 22:50:20 +00:00
Andres Gomez	a151500dd1	docs/relnotes: add support for VK_KHR_shader_float16_int8 v2: radv also supports it now (Samuel Pitoiset). Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-20 00:29:16 +02:00
Jason Ekstrand	9ce7c29724	anv/nir: Add a central helper for figuring out SSBO address formats Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	470422870a	nir: Add helpers for getting the type of an address format Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00

... 136 137 138 139 140 ...

117072 commits