fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-28 07:48:20 +02:00

Author	SHA1	Message	Date
Plamena Manolova	c5f3013cba	nir: Don't lower the local work group size if it's variable. If the local work group size is variable it won't be available at compile time so we can't lower it in nir_lower_system_values(). Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-13 10:57:04 +02:00
Timothy Arceri	34dffcf913	nir: add support for removing redundant stores to copy prop var For example the following type of thing is seen in TCS from a number of Vulkan and DXVK games: vec1 32 ssa_557 = deref_var &oPatch (shader_out float) vec1 32 ssa_558 = intrinsic load_deref (ssa_557) () vec1 32 ssa_559 = deref_var &oPatch@42 (shader_out float) vec1 32 ssa_560 = intrinsic load_deref (ssa_559) () vec1 32 ssa_561 = deref_var &oPatch@43 (shader_out float) vec1 32 ssa_562 = intrinsic load_deref (ssa_561) () intrinsic store_deref (ssa_557, ssa_558) (1) /* wrmask=x / intrinsic store_deref (ssa_559, ssa_560) (1) / wrmask=x / intrinsic store_deref (ssa_561, ssa_562) (1) / wrmask=x */ No shader-db changes on i965 (SKL). vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 7832 -> 7728 (-1.33 %) VGPRS: 6476 -> 6740 (4.08 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 469572 -> 456596 (-2.76 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 989 -> 960 (-2.93 %) Wait states: 0 -> 0 (0.00 %) The Max Waves and VGPRS changes here are misleading. What is happening is a bunch of TCS outputs are being optimised away as they are now recognised as unused. This results in more varyings being compacted via nir_compact_varyings() which can result in more register pressure when they are not packed in an optimal way. This is an existing problem independent of this patch. I've run some benchmarks and haven't noticed any performance regressions in affected games. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 15:19:36 +11:00
Christian Gmeiner	c6aaafa3a1	nir: add lowering for ffloor Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 21:57:25 +01:00
Lionel Landwerlin	8a15f06d19	nir/lower_tex: Add AYUV lowering support Byte ordering is : 0: V 1: U 2: Y 3: A v2: Split refactoring of alpha channel (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1) Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)	2018-11-12 13:22:54 +00:00
Lionel Landwerlin	0a30c33e83	nir/lower_tex: add alpha channel parameter for yuv lowering We're about to introduce AYUV support which provides its own alpha channel. So give alpha as a parameter and set it to 1 on exising formats. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-12 13:22:54 +00:00
Timothy Arceri	a068958692	nir: don't pack varyings ints with floats unless flat Fixes: `1c9c42d16b` ("nir: add varying component packing helpers") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 15:38:56 +11:00
Timothy Arceri	d40dd05553	nir: add new linking opt nir_link_constant_varyings() This pass moves constant outputs to the consuming shader stage where possible. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-10 11:41:00 +11:00
Iago Toral Quiroga	35baee5dce	nir/constant_folding: fix incorrect bit-size check nir_alu_type_get_type_size takes a type as parameter and we were passing a bit-size instead, which did what we wanted by accident, since a bit-size of zero matches nir_type_invalid, which has a size of 0 too. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-09 08:22:15 +01:00
Jason Ekstrand	61e15348c4	nir: Add a read_mask helper for ALU instructions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:22 -06:00
Timothy Arceri	769ae9fb7f	nir: fix condition propagation when src has a swizzle We cannot use nir_build_alu() to create the new alu as it has no way to know how many components of the src we will use. This results in it guessing the max number of components from one of its inputs. Fixes the following CTS tests: dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_geom dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_tessc dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_vert Fixes: `2975422ceb` ("nir: propagates if condition evaluation down some alu chains") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-03 00:44:01 +11:00
Timothy Arceri	c7bdda8aa5	nir: allow propagation of if evaluation for bcsel Shader-db results Skylake: total instructions in shared programs: 13109035 -> 13109024 (<.01%) instructions in affected programs: 4777 -> 4766 (-0.23%) helped: 11 HURT: 0 total cycles in shared programs: 332090418 -> 332090443 (<.01%) cycles in affected programs: 19474 -> 19499 (0.13%) helped: 6 HURT: 4 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-02 15:56:34 +11:00
Timothy Arceri	5b757b4097	nir: fix if condition propagation for alu use We need to update the cursor before we check if the alu use is dominated by the if condition. Previously we were checking if the current location of the alu instruction was dominated by the if condition which would miss some optimisation opportunities. Fixes: `a3b4cb3458` ("nir/opt_if: Rework condition propagation") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-01 09:22:55 +11:00
Eric Anholt	8265dfaa87	nir: Allow using nir_lower_io_to_scalar_early on VS input vars. This will be used on V3D to cut down the size of the VS inputs in the VPM (memory area for sharing data between shader stages). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-30 10:46:52 -07:00
Brian Paul	9007c0ed26	nir: fix yet another MSVC build break Trivial.	2018-10-29 11:15:12 -06:00
Jason Ekstrand	19064b8c3a	nir: Add a pass for gathering transform feedback info This is different from the GL_ARB_spirv pass because it generates a much simpler data structure that isn't tied to OpenGL and mtypes.h. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-29 17:09:08 +01:00
Brian Paul	7e64e39f8b	nir: Fix array initializer Empty initializer is not standard C. This fixes MSVC build. Trivial.	2018-10-26 12:35:48 -06:00
Jason Ekstrand	5bfce5fcc2	nir/system_values: Use the bit size from the load_deref This isn't a great solution for bit-sizes but we don't have a particularly convenient way to get a bit size from the system value enum and this keeps the lowering pass from changing it. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	a3b4cb3458	nir/opt_if: Rework condition propagation Instead of doing our own constant folding, we just emit instructions and let constant folding happen. This is substantially simpler and lets us use the nir_imm_bool helper instead of dealing with the const_value's ourselves. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	4cd8a58595	nir/search: Use the nir_imm_* helpers from nir_builder This requires that we rework the interface a bit to use nir_builder but that's a nice little modernization anyway. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	6e32115bd6	nir/builder: Handle 16-bit floats in nir_imm_floatN_t Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	ff45649bc2	nir/builder: Add a nir_imm_true/false helpers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	249e32ab17	nir/constant_folding: Use nir_src_as_bool for discard_if Missed one while converting to the nir_src_as_* helpers. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	6de1869e86	nir/constant_folding: Add an unreachable to a switch Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	28bb6abd1d	nir/validate: Print when the validation failed Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Eric Engestrom	e27902a261	util: use C99 declaration in the for-loop set_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Eric Engestrom	bb84fa146f	util: use C99 declaration in the for-loop hash_table_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Jose Fonseca	d9a04196d9	nir: Fix array initializer. Empty initializer is not standard C. This fixes MSVC build. Trivial.	2018-10-24 11:37:09 +01:00
Juan A. Suarez Romero	3112da346b	nir: fix nir_copy_propagation test Use nir_src_comp_as_uint() to read the proper second component, as nir_src_as_uint() returns the first one. v2: Use nir_src_comp_as_uint() [Jason] Fixes: `16870de8a0` ("nir: Use nir_src_is_const and nir_src_as_* in core code") Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108532 Tested-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-24 09:13:24 +02:00
Samuel Pitoiset	7c694cbfa4	nir: add linking helper nir_link_xfb_varyings() The linking opts shouldn't try removing or compacting XFB varyings in the consumer. To avoid this we copy the always_active_io flag from the producer. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-24 08:21:29 +11:00
Jason Ekstrand	ecb7775e1c	nir/algebraic: Fix a typo in the bit size validation code The conon_bit_class and canon_var_class variables got switched. Fixes: `932c650e0b` "nir/algebraic: Loosen a restriction on variables" Reported-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-23 12:22:29 -05:00
Jason Ekstrand	bf441d22a7	nir/algebraic: Provide descriptive asserts for bit size checks This will hopefully make debugging opt_algebraic bit-size compile failures easier. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	932c650e0b	nir/algebraic: Loosen a restriction on variables Previously, we would fail if a variable had an assigned but unknown bit size X and we tried to assign it an actual bit size. However, this is ok because, at the time we do the search, the variable does have an actual bit size and it will match X because of the NIR rules. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	ea9e651423	nir/algebraic: A bit of validation refactoring' We rename some local variables in validate() to be more readable and plumb the var through to get/set_var_bit_class instead of the var index. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	641f4be8e8	nir/algebraic: Make internal classes str-able Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	6068be543b	nir/algebraic: Generalize an optimization There's nothing boolean about (a \| ~a) ~> -1 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	69618a8678	nir/algebraic: Use bool internally instead of bool32 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	16870de8a0	nir: Use nir_src_is_const and nir_src_as_* in core code Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	ce36f412c9	nir/search_helpers: Use nir_src_is_const and friends This not only makes them safe for more bit sizes but it also fixes a bug in is_zero_to_one where it would return true for constant NaN. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	7bae7828aa	nir/search: Use nir_src_is_const and friends Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	bca5c2c688	nir: Add some new helpers for working with const sources Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Dave Airlie	ff281e6204	nir: fix clip cull lowering to not assert if GLSL already lowered. If GLSL has already done the lowering, we'd rather not crash in this pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-15 18:53:48 -07:00
Caio Marcelo de Oliveira Filho	b3c6146925	nir: Copy propagation between blocks Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	dc349f07b5	nir: Take call instruction into account in copy_prop_vars Calls are not used yet (functions are inlined), but since new code is already taking them into account, do it here too. The convention here and in other places is that no writable memory is assumed to remain unchanged, as well as global variables. Also, explicitly state the modes affected (instead of using the reverse logic) in one of the apply_for_barrier_modes calls. Suggested by Jason. v2: Consider local vars used by a call to be conservative, SPIR-V has such cases. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	797f01c220	nir: Add tests for copy propagation of derefs Also tests for removal of redundant loads, that we currently handle as part of the copy propagation. Note some tests involve multiple blocks and are currently DISABLED because they (expectedly) fail. v2: Add missing DISABLED prefix to "multi block" tests. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	4dfa7adc10	nir: Remove handling of dead writes from copy_prop_vars These are covered by another pass now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	cb126cf67a	nir: Separate dead write removal into its own pass Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	a02fd7000d	nir: Add tests for dead write elimination Note at the moment the pass called is nir_opt_copy_prop_vars, because dead write elimination is implemented there. Also added tests that involve identifying dead writes in multiple blocks (e.g. the overwrite happens in another block). Those currently fail as expected, so are marked to be skipped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	bbda2a17f7	nir: Add test file for vars related passes Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	c869646b7d	nir: Add nir_imm_ivec2 helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Eric Anholt	7d77fe1bcc	nir: Expose nir_remove_unused_io_vars(). For gallium drivers where you want to do some linking at variant compile time, you don't have the other producer/consumer shader on hand to modify. By exposing the inner function, the driver can have the used varyings in the compiled shader cache key and still do linking. This is also useful for V3D, where the binning shader wants to only output position and TF varyings. We've been removing those after nir_lower_io, but this will be less driver-specific code and let more of the shader get DCEed early in NIR. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-15 17:16:44 -07:00

... 4 5 6 7 8 ...

1356 commits