fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-04-16 06:20:44 +02:00

Author	SHA1	Message	Date
Marek Olšák	f671cc4d95	ac: set swizzled bit in cache policy as a hint not to merge loads/stores LLVM now merges loads and stores for all opcodes, so this must be set. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 16:48:27 -05:00
Eric Anholt	8afab607ac	nir: Add a scheduler pass to reduce maximum register pressure. This is similar to a scheduler I've written for vc4 and i965, but this time written at the NIR level so that hopefully it's reusable. A notable new feature it has is Goodman/Hsu's heuristic of "once we've started processing the uses of a value, prioritize processing the rest of their uses", which should help avoid the heuristic otherwise making such systematically bad choices around getting texture results consumed. Results for v3d: total instructions in shared programs: 6497588 -> 6518242 (0.32%) total threads in shared programs: 154000 -> 152828 (-0.76%) total uniforms in shared programs: 2119629 -> 2068681 (-2.40%) total spills in shared programs: 4984 -> 472 (-90.53%) total fills in shared programs: 6418 -> 1546 (-75.91%) Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2) v2: Use the DAG datastructure, fold in the scheduling-for-parallelism patch, include SSA defs in live values so we can switch to bottom-up if we want. v3: Squash in improvements from Alejandro Piñeiro for getting V3D to successfully register allocate on GLES3.1 dEQP. Make sure that discards don't move after store_output. Comment spelling fix.	2019-11-25 21:12:21 +00:00
Jonathan Marek	5159db60fc	etnaviv: implement 64bpp clear At the same time, update etna_clear_blit_pack_rgba to work with integer formats. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-11-25 20:23:22 +01:00
Jonathan Marek	2214f99c07	etnaviv: avoid using RS for 64bpp formats At the same time, this change allows using BLT for 8bpp formats Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-11-25 20:22:43 +01:00
Christian Gmeiner	92d5e3c692	etnaviv: add support for extended pe formats Use the extended format if an such a format was passed. v1 -> v2: - set FORMAT_MASK bit when using ext PE format as suggested by Wladimir J. van der Laan Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-11-25 20:12:52 +01:00
Christian Gmeiner	396818fd9d	etnaviv: handle 8 byte block in tiling Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-11-25 20:11:30 +01:00
Samuel Pitoiset	2af39c719e	radv: select the depth decompress path based on the aspect mask Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-25 16:29:23 +01:00
Samuel Pitoiset	905c005561	radv: create decompress pipelines for separate depth/stencil layouts No functional changes as the driver still uses the depth+stencil pipeline. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-25 16:29:21 +01:00
Samuel Pitoiset	faa58201f3	radv: rework creation of decompress/resummarize meta pipelines This refactoring will help for creating more decompress pipelines. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-25 16:29:18 +01:00
Samuel Pitoiset	8f0fb38825	radv: set the image view aspect mask before resolves No functional changes, but it will be used to decompress separate depth/stencil aspects. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-25 16:29:16 +01:00
Samuel Pitoiset	9dec90b7bc	radv: set the image view aspect mask during subpass transitions No functional changes because the aspect mask is still not used during image transitions but it will be needed for the separate depth/stencil aspects logic. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-25 16:29:13 +01:00
Rhys Perry	459bc77763	aco: enable load/store vectorizer Totals from affected shaders: SGPRS: 1890373 -> 1900772 (0.55 %) VGPRS: 1210024 -> 1215244 (0.43 %) Spilled SGPRs: 828 -> 828 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 252 -> 252 (0.00 %) dwords per thread Code Size: 81937504 -> 74608304 (-8.94 %) bytes LDS: 746 -> 746 (0.00 %) blocks Max Waves: 230491 -> 230158 (-0.14 %) In NeiR:Automata and GTA V, the code decrease is especially large: -13.79% and -15.32%, respectively. v9: rework the callback function v10: handle load_shared/store_shared in the callback Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)	2019-11-25 13:59:11 +00:00
Rhys Perry	0a759c3be6	nir: add load/store vectorizer tests v7: run nir_opt_algebraic v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v10: add tests for 64-bit offsets v10: add tests for signed offsets Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)	2019-11-25 13:59:11 +00:00
Rhys Perry	ce9205c03b	nir: add a load/store vectorization pass This pass combines intersecting, adjacent and identical loads/stores into potentially larger ones and will be used by ACO to greatly reduce the number of memory operations. v2: handle nir_deref_type_ptr_as_array v3: assume explicitly laid out types for derefs v4: create less deref casts v4: fix shared boolean vectorization v4: fix copy+paste error in resources_different v4: fix extract_subvector() to pass nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64 v4: rebase v5: subtract from deref/offset instead of scheduling offset calculations v5: various non-functional changes/cleanups v5: require less metadata and preserve more v5: rebase v6: cleanup and improve dependency handling v6: emit less deref casts v6: pass undef to components not set in the write_mask for new stores v7: fix 8-bit extract_vector() with 64-bit input v7: cleanup creation of store write data v7: update align correctly for when the bit size of load/store increases v7: rename extract_vector to extract_component and update comment v8: prevent combining of row-major matrix column acceses v9: rework process_block() to be able to vectorize more v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v9: remove entry::store_value, since it will not be updated if it's was from a vectorized load v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2 v9: handle nir_intrinsic_scoped_memory_barrier v10: use nir_ssa_scalar v10: handle non-32-bit offsets v10: use signed offsets for comparison v10: improve create_entry_key_from_offset() v10: support load_shared/store_shared v10: remove strip_deref_casts() v10: don't ever pass NULL to memcmp v10: remove recursion in gcd() v10: fix outdated comment v11: use the new nir_extract_bits() v12: remove use of nir_src_as_const_value in resources_different v13: make entry key hash function deterministic v13: simplify mask_sign_extend() v14: add comment in hash_entry_key() about hashing pointers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)	2019-11-25 13:59:11 +00:00
Rhys Perry	b3a3e4d1d2	radv: set alignment for load_ssbo/store_ssbo in meta shaders Otherwise, nir_intrinsic_align() will assert when called on the intrinsics Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 13:59:11 +00:00
Rhys Perry	c14f823ee5	nir: add nir_num_variable_modes and nir_var_mem_push_const These will be useful in the upcoming load/store vectorizer. v11: rebase Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-11-25 13:59:11 +00:00
Connor Abbott	01eb6ef870	aco: Make unused workgroup id's 0 It shouldn't matter, but the 1 was leftover from when it was handled together with workgroup_size and num_work_groups. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	bb78f9b4e4	aco: Use common argument handling Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	e7f4cadd02	radv: Replace supports_spill with explict_scratch_args The former was always true and hence dead code. We will want to explicitly declare the ring offset register with ACO, but we also want to declare the scratch offset too, and we can't try to disable it since ACO also supports spilling and the determination of whether spilling has to happen occurs well after setting up registers. So replace supports_spill with something that will actually be used for ACO. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 14:17:51 +01:00
Connor Abbott	4d6676d78a	aco: Make num_workgroups and local_invocation_ids one argument each To match the LLVM argument setup code. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	a7f1c63442	aco: Split vector arguments at the beginning Due to how LLVM works we have to make some of the FS inputs become vectors, and therefore have to split them early so that they don't take up extra register pressure due to how RA currently works. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	b45c54ff8d	aco: Use radv_shader_args in aco_compile_shader() Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	680b086db1	aco: Constify radv_nir_compiler_options in isel It's already const for everything else. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	66c703b3e8	radv: Move argument declaration out of nir_to_llvm Now it's executed for ACO too. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 14:17:51 +01:00
Connor Abbott	3b143369a5	ac/nir, radv, radeonsi: Switch to using ac_shader_args Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com>	2019-11-25 14:17:10 +01:00
Connor Abbott	9885af3bdf	ac: Add a shared interface between radv, radeonsi, LLVM and ACO ac_shader_args will be similar to ac_shader_abi, except for being free from LLVM-specific concepts and therefore capable of being shared between LLVM and ACO. This will help us accomplish a few different things: - Decouple setting up SGPR and VGPR arguments from translating to LLVM, so that we can reference these arguments in NIR lowering passes, which will let us lower e.g. descriptor sets in NIR. - Stop using radv-specific structures for things like determining the chip generation in ACO. In the end, we should replace ac_shader_abi with this structure + driver-specific lowering passes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 14:12:46 +01:00
Connor Abbott	43da33c169	radv: Rename ac_arg_regfile We'll duplicate this in a header file in the next commit, and then remove the original enum. Just rename it temporarily so that things keep building. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 14:12:46 +01:00
Danylo Piliaiev	29081c671f	drirc: Add glsl_zero_init workaround for GpuTest GiMark benchmark from GpuTest has such code in VS: out vec4 lightDir0; out vec4 lightDir1; ... lightDir0.xyz = lp0 - vVertex.xyz; lightDir1.xyz = lp1 - vVertex.xyz; In FS: float distSqr = dot(lightDir0, lightDir0); So due to the usage of uninitialized .w channel in the dot product, distSqr may become undefined which results in many black dots in the test on Iris. In https://www.geeks3d.com/forums/index.php/topic,6242.0.html developer stated that this benchmark most likely won't be updated. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1919 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-11-25 12:22:37 +02:00
Samuel Pitoiset	d6db858771	meson: only build imgui when needed Only required for Intel tools or the Vulkan overlay layer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-11-25 07:51:56 +00:00
Samuel Pitoiset	bfb307aea9	ac/llvm: fix the local invocation index for wave32 Fixes dEQP-VK.compute.builtin_var.local_invocation_index with RADV_PERFTEST=cswave32. My initial fix was to lower it but Rhys suggested the shift-right and it's much better like this. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-25 07:25:48 +00:00
Samuel Pitoiset	b99295fb33	radv: disable subgroup shuffle operations on GFX10 They are broken like on GFX6-GFX7. It seems better to disable them instead of enabling a broken feature. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-25 08:03:24 +01:00
Dave Airlie	506e51b856	llvmpipe: initial query buffer object support. (v2) This fails a couple of piglits due to other bugs in llvmpipe, but it adds support for the feature properly. v2: don't reset pipestats, just recalc, fix CI expectation	2019-11-25 12:37:32 +10:00
Timothy Arceri	f54c4e85ce	radv: create a fresh fork for each pipeline compile In order to prevent a potential malicious pipeline tainting our secure compile process and interfering with successive pipelines we want to create a fresh fork for each pipeline compile. Benchmarking has shown that simply forking on each pipeline creation doubles the total time it takes to compile a fossilize db collection. So instead here we fork the process at device creation so that we have a slim copy of the device and then fork this otherwise idle and untainted process each time we compile a pipeline. Forking this slim copy of the device results in only a 20% increase in compile time vs a 100% increase. Fixes: `cff53da3` ("radv: enable secure compile support")	2019-11-25 10:10:14 +11:00
Timothy Arceri	1663bb1f77	radv: add a secure_compile_open_fifo_fds() helper This will be used to create a communication pipe between the user facing device and a freshly forked (per pipeline compile) slim copy of that device. We can't use pipe() here because the fork will not be a direct fork of the user facing process. Instead we use a previously forked copy of the process that was forked at device creation in order to reduce the resources required for the fork and avoid performance issues. Fixes: `cff53da374` ("radv: enable secure compile support")	2019-11-25 10:10:14 +11:00
Timothy Arceri	ef54f15da9	radv: add some infrastructure for fresh forks for each secure compile In the following commits we want to be able to fork an existing lightweight fork created at device creation time. In order for the user facing process to communicate with this new fresh fork we create some members here to hold FIFO file descriptors and a unique id. Here we also add a new fork enum that we use to tell the lightweight process to create a fresh fork. For more information on why we create a fresh fork see the following commits.	2019-11-25 10:10:14 +11:00
Brian Paul	a2689ebcd6	nir: no-op C99 _Pragma() with MSVC This fixes a build failure on MSVC. BTW, it looks like clang supports _Pragma() but I don't know if it understands the "gcc unroll N" directive. Signed-off-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-11-23 10:34:24 -07:00
Michel Zou	02d63ee5a4	disk_cache_get_function_timestamp: check for dladdr instead of dlopen Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-11-23 12:01:11 +01:00
Marek Olšák	ad40715f35	nir/serialize: support any num_components for remaining instructions Only NPOT vectors greater than vec4 use the extra uint32. This is for instructions that share the dest code. load_const and undef already support 1-16 in the header. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	c028449c01	nir/serialize: use 3 unused bits in intrinsic for packed_const_indices Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	3d44aed09e	nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	a2df670b14	nir/serialize: serialize writemask for vec8 and vec16 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	a5c5388234	nir/serialize: serialize swizzles for vec8 and vec16 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	f1a48d54ea	nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	487a495cc0	nir/serialize: remove up to 3 consecutive equal ALU instruction headers vec4 scalarized ALUs typically have 4 equal instruction headers, so remove the last 3. There are no bits left in the ALU header for more flags, so future extensions of NIR will have to use something like instr_type == 15 to describe more complex ALU instructions. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	c3fa9de2a9	nir/serialize: try to pack both deref array src into 32 bits Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	ed6b01d5e0	nir/serialize: cleanup - fold nir_deref_type_var cases into switches Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	a0cd67d292	nir/serialize: try to put deref->var index into the unused bits of the header Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	ca201bfe70	nir/serialize: don't serialize mode for deref non-cast instructions It can be derived from src and var. This frees 10 bits in the header that will be used later. "mode" is moved in the structure, because those bits will be used for something else later. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	2286340fde	nir/serialize: don't store deref types if not needed - type_cast: deduplicate types if the last one is the same - derive the type from the parent for other derefs Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00
Marek Olšák	70a7f85149	nir/serialize: try to pack two alu srcs into 1 uint32 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-23 00:02:10 -05:00

1 2 3 4 5 ...

108867 commits