fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-25 04:20:08 +01:00

Author	SHA1	Message	Date
Marek Olšák	ff71fae440	nir: strip as we serialize to remove the nir_shader_clone call Serializing stripped NIR is faster now. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-11-21 18:49:57 -05:00
Jason Ekstrand	b8d45d9307	nir: Add tests for nir_extract_bits	2019-11-11 17:17:02 +00:00
Rob Clark	5e08f070f0	nir: add nir_lower_amul pass Lower amul to either imul or imul24, depending on whether 24b is enough bits to calculate an offset within the thing being dereferenced. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-10-18 15:08:54 -07:00
Erik Faye-Lund	878c94288a	nir: add lowering-pass for point-size mov Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-10-17 10:41:36 +02:00
Dave Airlie	dc91a02a72	nir: add a pass to lower flat shading. This takes any color or backcolor that has unspecified shading and converts it to flat shading. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-10-17 10:41:36 +02:00
Marek Olšák	3340c066a1	nir: move gl_nir_opt_access from glsl directory Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-10 15:49:18 -04:00
Eric Engestrom	7a1dc6ab44	meson: rename libnir to _libnir to make it clear it's not meant to be used anywhere else Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-10-07 21:49:40 +01:00
Timur Kristóf	610cc3089c	nir: Carve out nir_lower_samplers from GLSL code. Lowering samplers is needed to produce NIR that can actually be consumed by some gallium drivers, so it doesn't make sense to to keep it only in the GLSL code. This commit introduces nir_lower_samplers to compiler/nir, while maintains the GL-specific function too. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-06 12:20:20 +03:00
Daniel Schürmann	df86c5ffb3	nir: add divergence analysis pass. This pass expects the shader to be in LCSSA form. The algorithm is based on 'The Simple Divergence Analysis' from Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira. Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-20 17:40:13 +02:00
Eric Engestrom	a3d6024199	meson: add nir tests to the compiler/nir test suite Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-14 22:17:06 +01:00
Iago Toral Quiroga	48f5c34301	nir: add a pass to clamp gl_PointSize to a range The OpenGL and OpenGL ES specs require that implementations clamp the value of gl_PointSize to an implementation-depedent range. This pass is useful for any GPU hardware that doesn't do this automatically for either one or both sides of the range, such as V3D. v2: - Turn into a generic NIR pass (Eric). - Make the pass work before lower I/O so we can use the deref variable to inspect if we are writing to gl_PointSize (Eric). - Make the pass take the range to clamp as parameter and allow it to clamp to both sides of the range or just one side. - Make the pass report progress. v3: - Fix copyright header (Eric) - use fmin/fmax instead of bcsel to clamp (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 09:44:12 +02:00
Rhys Perry	7740149852	nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Rhys Perry	da8ed68aca	nir: replace nir_move_load_const() with nir_opt_sink() This is mostly the same as nir_move_load_const() but can also move undef instructions, comparisons and some intrinsics (being careful with loops). v2: actually delete nir_move_load_const.c v3: fix nir_opt_sink() usage in freedreno v3: update Makefile.sources v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def v4: handle if uses v4: fix handling of nested loops v5: re-write adjust_block_for_loops v5: re-write setting of use_block for if uses Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Ian Romanick	405de7ccb6	nir/range-analysis: Rudimentary value range analysis pass Most integer operations are omitted because dealing with integer overflow is hard. There are a few things that could be smarter if there was a small amount more tracking of ranges of integer types (i.e., operands are Boolean, operand values fit in 16 bits, etc.). The changes to nir_search_helpers.h are included in this patch to simplify reordering the changes to nir_opt_algebraic.py. v2: Memoize range analysis results. Without this, some shaders appear to get stuck in infinite loops. v3: Rebase on many months of Mesa changes, including 1-bit Boolean changes. v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction". v5: Use nir_alu_srcs_equal for detecting (aa). Previously just the SSA value was compared, and this incorrectly matched (a.xa.y). v6: Many code improvements including (but not limited to) better names, more comments, and better use of helper functions. All suggested by Caio. Rework the handling of several opcodes to use a table for mapping source ranges to a result range. This change fixed a bug that caused fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero. Slightly tighten the range of fmul by recognizing that xx is gt_zero if x is gt_zero. Add similar handling for -xx. v7: Use _______ in the tables as an alias for unknown. Suggested by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Eric Engestrom	d2d85b950d	meson: replace libmesa_util with idep_mesautil This automates the include_directories and dependencies tracking so that all users of libmesa_util don't need to add them manually. Next commit will remove the ones that were only added for that reason. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Eric Anholt <eric@anholt.net> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-03 00:08:37 +00:00
Jonathan Marek	bc3b6168ba	nir: replace lower_sincos with algebraic opt This version has less ops for the same precision. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Ian Romanick	b08d704051	nir: Add unit tests for nir_opt_comparison_pre Each tests has a comment with the expected before and after NIR. The tests don't actually check this. The tests only check whether or not the optimization pass reported progress. I couldn't think of a robust, future-proof way to check the before and after code. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Daniel Schürmann	c31f470066	anv,nir: Move lower_input_attachments pass from ANV to NIR. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:02:50 +02:00
Rob Clark	5787a2dfe3	nir: add pass to lower load_interpolated_input Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Connor Abbott	47e7c6961a	nir: add a vectorization pass This effectively does the opposite of nir_lower_alus_to_scalar, trying to combine per-component ALU operations with the same sources but different swizzles into one larger ALU operation. It uses a similar model as CSE, where we do a depth-first approach and keep around a hash set of instructions to be combined, but there are a few major differences: 1. For now, we only support entirely per-component ALU operations. 2. Since it's not always guaranteed that we'll be able to combine equivalent instructions, we keep a stack of equivalent instructions around, trying to combine new instructions with instructions on the stack. The pass isn't comprehensive by far; it can't handle operations where some of the sources are per-component and others aren't, and it can't handle phi nodes. But it should handle the more common cases, and it should be reasonably efficient. [Alyssa: Rebase on latest master, updating with respect to typeless moves] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-18 06:43:30 -07:00
Ian Romanick	3ee2e84c60	nir: Rematerialize compare instructions On some architectures, Boolean values used to control conditional branches or condtional selection must be propagated into a flag. This generally means that a stored Boolean value must be compared with zero. Rather than force the generation of extra compares with zero, re-emit the original comparison instruction. This can save register pressure by not needing to store the Boolean value. There are several possible ares for future improvement to this pass: 1. Be more conservative. If both sources to the comparison instruction are non-constants, it may be better for register pressure to emit the extra compare. The current shader-db results on Intel GPUs (next commit) lead me to believe that this is not currently a problem. 2. Be less conservative. Currently the pass requires that all users of the comparison match the pattern. The idea is that after the pass is complete, no instruction will use the resulting Boolean value. The only uses will be of the flag value. It may be beneficial to relax this requirement in some cases. 3. Be less conservative. Also try to rematerialize comparisons used for discard_if intrinsics. After changing the way the Intel compiler generates cod e for discard_if (see MR!935), I tried implementing this already. The changes were pretty small. Instructions were helped in 19 shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit. 4. Copy the preceeding ALU instruction. If the comparison is a comparison with zero, and it is the only user of a particular ALU instruction (e.g., (a+b) != 0.0), it may be a further improvment to also copy the preceeding ALU instruction. On Intel GPUs, this may enable cmod propagation to make additional progress. v2: Use much simpler method to get the prev_block for an if-statement. Suggested by Tim. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Vasily Khoruzhick	e67e4e90b2	nir: implement lowering for fsin and fcos Lower sin and cos using Nick's fast sin/cos approximation from https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 15:25:21 +00:00
Ian Romanick	158370ed2a	nir/flrp: Add new lowering pass for flrp instructions This pass will soon grow to include some optimizations that are difficult or impossible to implement correctly within nir_opt_algebraic. It also include the ability to generate strictly correct code which the current nir_opt_algebraic lowering lacks (though that could be changed). v2: Document the parameters to nir_lower_flrp. Rebase on top of `3766334923` ("compiler/nir: add lowering for 16-bit flrp") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Vasily Khoruzhick	443c5a3cd6	nir: add int_to_float lowering pass This new pass lowers ints and bools to floats. It allows hardware that doesn't have native integers (e.g. Mali4x0) use the same code paths as modern hardware. It uses newly introduced pass to gather SSA types and should be used as late as possible. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 01:07:27 +00:00
Jason Ekstrand	91899495a1	nir: Add a SSA type gathering pass This new pass (which isn't even compile-tested) attempts to determine the ALU type of all the SSA values in a function impl. It takes a greedy approach and assigns intness or floatness to everything it thinks can possibly contain an int or a float. Some values will be labled as both int and float and some will be labled as neither and it is up to the caller to decide what to do with this information. However, for a "nice" shader where the original source contained no bit-casts and no implicit bit-casts were introduced by optimizations, there shouldn't be any overlap in the two sets save for the odd CSEd zero constant. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-04 03:52:05 +00:00
Rob Clark	a99c360a46	nir: add pass to lower fb reads Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Andreas Baierl	b82de2b4d7	nir: add rcp(w) lowering for gl_FragCoord On some hardware (e.g. Mali400) the shader needs to apply some transformations for correct gl_FragCoord handling. The lowering actions look like the following in pseudocode: gl_FragCoord.xyz = gl_FragCoord_orig.xyz gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w Add this lowering as a nir pass in preparation for using it in the driver. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-29 02:46:44 +00:00
Alyssa Rosenzweig	2ce4adefa5	nir: Add nir_lower_viewport_transform On Mali hardware (supported by Panfrost and Lima), the fixed-function transformation from world-space to screen-space coordinates is done in the vertex shader prior to writing out the gl_Position varying, rather than in dedicated hardware. This commit adds a shared NIR pass for implementing coordinate transformation and lowering gl_Position writes into screen-space gl_Position writes. v2: Run directly on derefs before io/vars are lowered to cleanup the code substantially. Thank you to Qiang for this suggestion! v3: Bikeshed continues. v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment. Ian and Qiang's reviews are from v3, but no real functional changes from v4. Rob's review is from v4. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Suggested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-14 19:15:13 +00:00
Jason Ekstrand	18ed82b084	nir: Add a pass for selectively lowering variables to scratch space This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-12 15:59:31 -07:00
Jason Ekstrand	6279074de1	nir: Get rid of global registers We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Ian Romanick	2cf59861a8	nir: Add partial redundancy elimination for compares This pass attempts to dectect code sequences like if (x < y) { z = y - x; ... } and replace them with sequences like t = x - y; if (t < 0) { z = -t; ... } On architectures where the subtract can generate the flags used by the if-statement, this saves an instruction. It's also possible that moving an instruction out of the if-statement will allow nir_opt_peephole_select to convert the whole thing to a bcsel. Currently only floating point compares and adds are supported. Adding support for integer will be a challenge due to integer overflow. There are a couple possible solutions, but they may not apply to all architectures. v2: Fix a typo in the commit message and a couple typos in comments. Fix possible NULL pointer deref from result of push_block(). Add missing (-A + B) case. Suggested by Caio. v3: Fix is_not_const_zero to work correctly with types other than nir_type_float32. Suggested by Ken. v4: Add some comments explaining how this works. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:53 -07:00
Ian Romanick	be1cc3552b	nir: Add nir_const_value_negative_equal v2: Rebase on 1-bit Boolean changes. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:52 -07:00
Jason Ekstrand	3bd5457641	nir: Add a lowering pass for non-uniform resource access Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-25 15:00:36 -05:00
Samuel Pitoiset	23d30f4099	spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass This lowering isn't needed for RADV because AMDGCN has two instructions. It will be disabled for RADV in an upcoming series. While we are at it, factorize a little bit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-22 19:41:46 +01:00
Jason Ekstrand	35b8f6f40b	nir: Add a new pass to lower array dereferences on vectors This pass was originally written for lowering TCS output reads and writes but it is also applicable just about anything including UBOs, SSBOs, and shared variables. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:27 -05:00
Caio Marcelo de Oliveira Filho	822a8865e4	nir: Add a pass to combine store_derefs to same vector v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-13 08:39:16 -07:00
Jason Ekstrand	5ef2b8f1f2	nir: Add a pass for lowering IO back to vector when possible This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-12 15:34:06 +00:00
Connor Abbott	5b2ec9c81e	nir: Add a stripping pass for improved cacheability Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-12 10:49:48 +01:00
Karol Herbst	272e927d0e	nir/spirv: initial handling of OpenCL.std extension opcodes Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Timur Kristóf	909d1f50f3	nir: Move nir_lower_uniforms_to_ubo to compiler/nir. The nir_lower_uniforms_to_ubo function is useful outside of mesa/state_tracker, and in fact is needed to produce NIR for drivers that have the PIPE_CAP_PACKED_UNIFORMS capability. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Jason Ekstrand	2d2737dcfe	nir: Add a bool to float32 lowering pass From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering. ior: fmax instead of fadd allows removing the fsat. inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better with the scalar instruction set. Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-01-14 19:27:06 +00:00
Jason Ekstrand	11dc130779	nir: Add a bool to int32 lowering pass We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	74492ebad9	nir: Add a pass for lowering integer division by constants It's a reasonably well-known fact in the world of compilers that integer divisions by constants can be replaced by a multiply, an add, and some shifts. This commit adds such an optimization to NIR for easiest case of udiv. Other division operations will be added in following commits. In order to provide some additional driver control, the pass takes a minimum bit size to optimize. Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Dylan Baker	6d3cbbbe15	meson: Add nir_algebraic_parser_test to suites Just to make it easier to run a nir tests together. Fixes: `a0ae12ca91` ("nir/algebraic: Add unit tests for bitsize validation") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-10 09:14:44 -08:00
Connor Abbott	a0ae12ca91	nir/algebraic: Add unit tests for bitsize validation The non-failure path can be tested by just compiling mesa and then testing it, but the failure paths won't be hit unless you make a mistake, so it's best to test them with some unit tests. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-05 17:57:40 +01:00
Dylan Baker	a999798daa	meson: Add tests to suites Meson test has a concepts of suites, which allow tests to be grouped together. This allows for a subtest of tests to be run only (say only the tests for nir). A test can be added to more than one suite, but for the most part I've only added a test to a single suite, though I've added a compiler group that includes nir, glsl, and glcpp tests. To use this you'll need to invoke meson test directly, instead of ninja test (which always runs all targets). it can be invoked as: `meson test -C builddir --suite $suitename` (meson test has addition options that are pretty useful). Tested-By: Gert Wollny <gert.wollny@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-20 09:09:22 -08:00
Jason Ekstrand	19064b8c3a	nir: Add a pass for gathering transform feedback info This is different from the GL_ARB_spirv pass because it generates a much simpler data structure that isn't tied to OpenGL and mtypes.h. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-29 17:09:08 +01:00
Caio Marcelo de Oliveira Filho	cb126cf67a	nir: Separate dead write removal into its own pass Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	bbda2a17f7	nir: Add test file for vars related passes Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Jason Ekstrand	53072582dc	nir: Add an array copy optimization This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:47:47 -05:00

1 2

86 commits