fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-17 03:08:07 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	1d10d17817	nir/lower_shader_calls: add an option structure for future optimizations Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	d0543bfbec	nir/lower_shader_calls: cleanup shaders a bit more post split Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	6d7e04d924	nir/lower_shader_calls: add NIR_PASS_V internally Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	dc70519c8a	nir/lower_shader_calls: rematerialize values in more complex cases Previously when considering whether to rematerialize or spill/fill ssa_1954, we would go for a spill/fill : vec4 32 ssa_388 = (float32)txf ssa_387 (texture_handle), ssa_86 (coord), ssa_23 (lod), 0 (texture), 0 (sampler) ... vec1 32 ssa_1953 = load_const (0xbd23d70a = -0.040000) vec1 32 ssa_1954 = fadd ssa_388.x, ssa_1953 vec1 32 ssa_1955 = fneg ssa_1954 This is because when looking at ssa_1955 the first time, we would consider ssa_388 unrematerialiable, and therefore all values built on top of it would be considered unrematerialiable as well. The missing piece when considering whether to rematerialize ssa_1954 is that we should look at filled values. Now that ssa_388 has been spilled/filled, we can rebuild ssa_1955 on top of the filled value and avoid spilling/filling ssa_1955 at all. This requires a bit more work though. We can't just look at an instruction in isolation, we need to go through the ssa chains until we find values we can rematerialize or not. In this change we build a list of all ssa values involved in building a given value, up to the point there we find a filled or a rematerializable value. In this particular case, looking at ssa_1955 : * We can rematerialize ssa_388 from its filled value * We can rematerialize ssa_1953 trivially * We can rematerialize ssa_1954 because its 2 inputs are rematerializable * We can rematerialize ssa_1955 because ssa_1954 is rematerializable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	ca2a1340a2	nir/lower_shader_calls: avoid respilling values Currently we do something like this : ssa_0 = ... ssa_1 = ... * spill ssa_0, ssa_1 call1() * fill ssa_0, ssa_1 ssa_2 = ... ssa_3 = ... * spill ssa_0, ssa_1, ssa_2, ssa_3 call2() * fill ssa_0, ssa_1, ssa_2, ssa_3 If we assign the same possition to ssa_0 & ssa_1 in the spilling stack, then on call2(), we know that those values are already present in memory at the right location and we can avoid respilling them. The result would be something like this : ssa_0 = ... ssa_1 = ... * spill ssa_0, ssa_1 call1() * fill ssa_0, ssa_1 ssa_2 = ... ssa_3 = ... * spill ssa_2, ssa_3 call2() * fill ssa_0, ssa_1, ssa_2, ssa_3 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	5a9f8d21d0	nir/lower_shader_calls: lower scratch access to format internally For a follow up optimization, we would like to track scratch loads. This isn't possible with global load/store intrinsics. So use a couple of special intrinsic in the pass and only lower it to global intrinsics at the end. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	df685b4f9c	nir/lower_shader_calls: rematerialize more trivial values Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Rhys Perry	382831c986	radv,nir: add intrinsics for streamout and GS copy shaders Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19302>	2022-10-25 17:35:08 +00:00
Qiang Yu	7fb506d068	nir: add nir_load_prim_xfb_query_enabled_amd Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>	2022-10-25 12:58:43 +00:00
Qiang Yu	a119a6464f	nir,ac,radv: add primitive count add intrinsics radeonsi use shader buffer, but radv use gds for the query result storage. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>	2022-10-25 12:58:43 +00:00
Qiang Yu	83643e4dc8	nir,ac/nir/ngg,radv: split shader_query_enabled_amd For used by different counter. Vulkan: 1. VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT, sum generated primitives of all 4 streams when GS. 2. VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, count generated primitives for all 4 streams when VS/TES/GS. 3. VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT, count generated and streamout primitives for all 4 streams when VS/TES/GS. OpenGL: 1. GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB, sum generated primitives for all 4 streams when GS. 2. GL_PRIMITIVES_GENERATED, count generated primitives for all 4 streams when VS/TES/GS. 3. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, count streamout primitives for all 4 streams when VS/TES/GS. pipeline_stat_query_enabled_amd is for Vulkan 1 and OpenGL 1. xfb_query_enabled_amd is for Vulkan 2/3 and OpenGL 2/3. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>	2022-10-25 02:42:52 +00:00
Rob Clark	a8e84f50bc	nir: Add helper to create passthrough TCS shader Based on si_create_passthrough_tcs() as that seemed the most generic of the various different backend driver implementations. Uses the load_tess_level_outer_default and load_tess_level_inner_default intrinsics to load the gl_TessLevelOuter and gl_TessLevelInner values, so driver will somehow need to implement those to load the values set by pipe_context::set_tess_state() or similar. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>	2022-10-24 21:39:38 +00:00
Alyssa Rosenzweig	80de33cf6a	nir/opt_preamble: Move load_texture_base_agx nir_opt_preamble will be crucial to optimize out the lowering for array textures on AGX, which involves this AGX-specific sysval. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>	2022-10-22 14:48:04 -04:00
Georg Lehmann	741dbadae0	nir: Fix ifind_msb_rev constant folding. For example if src0 is 0x80000000 we should return 1, not 0. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `a5747f8ab3` ("nir: add opcodes for *find_msb_rev and lowering") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:55 +02:00
Georg Lehmann	125741dbae	nir/opt_algebraic: Optimize various find_msb_rev patterns. From dxvk, dxil-spirv, fxc, dxc and others. Totals from 177 (0.13% of 134913) affected shaders: CodeSize: 1079504 -> 1059872 (-1.82%) Instrs: 195381 -> 192269 (-1.59%) Latency: 3664137 -> 3631951 (-0.88%) InvThroughput: 599479 -> 585675 (-2.30%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:33 +02:00
Georg Lehmann	7505be3497	nir/opt_algebraic: Add an option to lower uclz. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:10 +02:00
Georg Lehmann	1e552b9c95	nir/opt_algebraic: Mirror optimizations for find_msb_rev. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:56:44 +02:00
Emma Anholt	22f7f167cd	nir/opt_phi_precision: Fix missing swizzles when narrowing phi srcs. This NIR: vec4 32 ssa_169 = phi block_1: ssa_168, block_2: ssa_138 vec1 16 ssa_209 = f2fmp ssa_169.x vec1 16 ssa_210 = f2fmp ssa_169.y vec1 16 ssa_211 = f2fmp ssa_169.z vec1 16 ssa_212 = f2fmp ssa_169.w vec4 16 ssa_213 = vec4 ssa_209, ssa_210, ssa_211, ssa_212 intrinsic store_output (ssa_213, ssa_171) (base=0, wrmask=xyzw /15/, component=0, src_type=float16 /144/, io location=4 slots=1 mediump /8388740/, xfb() /0/, xfb2() /0/) would turn into: vec4 32 ssa_169 = phi block_1: ssa_168, block_2: ssa_138 vec4 16 ssa_216 = phi block_1: ssa_214, block_2: ssa_215 vec1 16 ssa_209 = f2fmp ssa_169.x vec1 16 ssa_210 = f2fmp ssa_169.y vec1 16 ssa_211 = f2fmp ssa_169.z vec1 16 ssa_212 = f2fmp ssa_169.w vec4 16 ssa_213 = vec4 ssa_216.x, ssa_216.x, ssa_216.x, ssa_216.x intrinsic store_output (ssa_213, ssa_171) (base=0, wrmask=xyzw /15/, component=0, src_type=float16 /144/, io location=4 slots=1 mediump /8388740/, xfb() /0/, xfb2() /0/) ignoring the swizzles from the f2fmp srcs. Fixes failures in dEQP-GLES2.functional.shaders.random.all_features.fragment.20 on turnip+ANGLE. Fixes: `c7b935962b` ("nir: Add pass to lower phi precision") Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19179>	2022-10-22 03:06:31 +00:00
Timur Kristóf	e52c2f4fca	nir, ac, aco: Add index src to load_buffer_amd/store_buffer_amd. Also modify all existing uses to pass a zero to this new src. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>	2022-10-20 20:00:50 +00:00
Timur Kristóf	c918f0934e	nir, ac, aco: Add ACCESS intrinsic index to load/store_buffer_amd. Previously, we always treated these as coherent, but now let's make this configurable. Also set all current users to ACCESS_COHERENT. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>	2022-10-20 20:00:49 +00:00
Rhys Perry	1ae73bc076	nir/algebraic: optimize b<<a + c<<a fossil-db (navi21): Totals from 248 (0.18% of 135636) affected shaders: Instrs: 85836 -> 85611 (-0.26%); split: -0.27%, +0.00% CodeSize: 481304 -> 480332 (-0.20%); split: -0.21%, +0.00% Latency: 9596559 -> 9596152 (-0.00%); split: -0.00%, +0.00% InvThroughput: 1423707 -> 1423670 (-0.00%) SClause: 3872 -> 3874 (+0.05%) PreSGPRs: 5034 -> 5038 (+0.08%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19137>	2022-10-20 18:57:23 +00:00
Samuel Pitoiset	09033c7b22	nir: add nir_intrinsic_load_ring_attr_{offset}_amd These intrinsics will be used to lower NGG attributes to memory on GFX11. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19173>	2022-10-20 15:59:44 +00:00
Qiang Yu	58e006b174	nir,ac/llvm,radv: add nir_intrinsic_load_provoking_vtx_in_prim_amd For radeonsi which load this from arg. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19166>	2022-10-20 06:53:56 +00:00
Karol Herbst	d7156e5d9c	nir/lower_cl_images: set binding Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19151>	2022-10-19 06:27:20 +00:00
Alyssa Rosenzweig	78adf44839	nir/lower_io: Set interpolated_input dest_type ...even for non-pixel interpolation, for consistency. Otherwise backends get funny intrinsics with interpolateAt: vec4 32 ssa_4 = intrinsic load_interpolated_input (ssa_3, ssa_2) (base=1, component=0, dest_type=invalid /0/, io location=33 slots=1 /161/) We know it'll be a float, but backends shouldn't need to special case this. (Or maybe interpolated_input shouldn't have a dest_type index. I'd be ok with that resolution too. But having one and not setting it consistently is wrong.) Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19085>	2022-10-18 21:08:54 +00:00
Yonggang Luo	a9da108c6b	nir: No need redefine snprintf anymore in nir.h Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18685>	2022-10-18 03:16:00 +00:00
Alyssa Rosenzweig	2c7be4d421	nir: Usher nir_normalize_cubemap_coords into 2022 I stumbled upon this old NIR pass (still in use by intel and broadcom) and noticed how most of the code was NIR boilerplate that we have helpers for. Rewrite the pass to use all the helpers. v2: Fix cube map arrays. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18754>	2022-10-17 20:46:24 +00:00
Alyssa Rosenzweig	fc5c671e87	nir: Fix nir_fmax_abs_vec_comp This failed to take fabs of the first component, implementing an unintended formula that would return the right results in some common cases but is wrong in general: max { x, \|y\|, \|z\| } instead of the intended max { \|x\|, \|y\|, \|z\| } Reexpress the implementation to make correctness obvious. Fixes: `272e927d0e` ("nir/spirv: initial handling of OpenCL.std extension opcodes") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18754>	2022-10-17 20:46:24 +00:00
Alyssa Rosenzweig	ac2964dfbd	nir: Be smarter fusing ffma If there is a single use of fmul, and that single use is fadd, it makes sense to fuse ffma, as we already do. However, if there are multiple uses, fusing may impede code gen. Consider the source fragment: a = fmul(x, y) b = fadd(a, z) c = fmin(a, t) d = fmax(b, c) The fmul has two uses. The current ffma fusing is greedy and will produce the following "optimized" code. a = fmul(x, y) b = ffma(x, y, z) c = fmin(a, t) d = fmax(b, c) Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1 fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of one multiply and an add. Depending on the ISA, that could impede scheduling or increase code size. It can also increase register pressure, extending the live range. It's tempting to gate on is_used_once, but that would hurt in cases where we really do fuse everything, e.g.: a = fmul(x, y) b = fadd(a, z) c = fadd(a, t) For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2 fadd. So what we really want is to fuse ffma iff the fmul will get deleted. That occurs iff all uses of the fmul are fadd and will themselves get fused to ffma, leaving fmul to get dead code eliminated. That's easy to implement with a new NIR search helper, checking that all uses are fadd. shader-db results on Mali-G57 [open shader-db + subset of closed]: total instructions in shared programs: 179491 -> 178991 (-0.28%) instructions in affected programs: 36862 -> 36362 (-1.36%) helped: 190 HURT: 27 total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%) cycles in affected programs: 72.02 -> 70.56 (-2.02%) helped: 28 HURT: 1 total fma in shared programs: 1590.47 -> 1582.61 (-0.49%) fma in affected programs: 319.95 -> 312.09 (-2.46%) helped: 194 HURT: 1 total cvt in shared programs: 812.98 -> 813.03 (<.01%) cvt in affected programs: 118.53 -> 118.58 (0.04%) helped: 65 HURT: 81 total quadwords in shared programs: 98968 -> 98840 (-0.13%) quadwords in affected programs: 2960 -> 2832 (-4.32%) helped: 20 HURT: 4 total threads in shared programs: 4693 -> 4697 (0.09%) threads in affected programs: 4 -> 8 (100.00%) helped: 4 HURT: 0 v2: Update trace checksums for virgl due to numerical differences. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>	2022-10-15 17:47:31 +00:00
Gert Wollny	2e50bf19cd	nir: move fusing csel and comparisons to opt_late_algebraic With that simple comparisons are cleaned up properly. This helps with some tesselation shaders on r600. Shader-db stats R600/Cayman: -------------------------------------------------------------- total dw in shared programs: 1621806 -> 1620884 (-0.06%) dw in affected programs: 41650 -> 40728 (-2.21%) helped: 211 HURT: 4 helped stats (abs) min: 2 max: 26 x̄: 4.46 x̃: 4 helped stats (rel) min: 0.30% max: 9.68% x̄: 2.87% x̃: 2.52% HURT stats (abs) min: 2 max: 8 x̄: 5.00 x̃: 5 HURT stats (rel) min: 0.23% max: 1.67% x̄: 1.02% x̃: 1.09% 95% mean confidence interval for dw value: -4.81 -3.77 95% mean confidence interval for dw %-change: -3.03% -2.57% Dw are helped. total gprs in shared programs: 41192 -> 41182 (-0.02%) gprs in affected programs: 731 -> 721 (-1.37%) helped: 53 HURT: 45 helped stats (abs) min: 1 max: 3 x̄: 1.23 x̃: 1 helped stats (rel) min: 5.88% max: 40.00% x̄: 16.56% x̃: 14.29% HURT stats (abs) min: 1 max: 2 x̄: 1.22 x̃: 1 HURT stats (rel) min: 7.69% max: 40.00% x̄: 19.42% x̃: 20.00% 95% mean confidence interval for gprs value: -0.37 0.16 95% mean confidence interval for gprs %-change: -3.92% 3.85% Inconclusive result (value mean confidence interval includes 0). total alu_groups in shared programs: 203677 -> 203632 (-0.02%) alu_groups in affected programs: 2876 -> 2831 (-1.56%) helped: 68 HURT: 30 helped stats (abs) min: 1 max: 4 x̄: 1.46 x̃: 1 helped stats (rel) min: 0.84% max: 25.00% x̄: 7.48% x̃: 5.41% HURT stats (abs) min: 1 max: 6 x̄: 1.80 x̃: 1 HURT stats (rel) min: 1.98% max: 33.33% x̄: 10.09% x̃: 5.61% 95% mean confidence interval for alu_groups value: -0.81 -0.11 95% mean confidence interval for alu_groups %-change: -4.20% <.01% Alu_groups are helped. total loops in shared programs: 72 -> 72 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cf in shared programs: 88230 -> 88233 (<.01%) cf in affected programs: 71 -> 74 (4.23%) helped: 1 HURT: 4 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.89% max: 33.33% x̄: 17.14% x̃: 16.67% 95% mean confidence interval for cf value: -0.51 1.71 95% mean confidence interval for cf %-change: -24.20% 38.29% Inconclusive result (value mean confidence interval includes 0). total stack in shared programs: 3827 -> 3827 (0.00%) stack in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 45.32 -> 41.69 (-8.01%) -------------------------------------------------------------- v2: Simplify replacement pattern (Rhys Perry) v3: fix ws (Alexander Orzechowski) v4: move the original lowering to opt_late_algebraic and drop cleanup code (Alyssa) v5: Add shader-sb stats (Alyssa) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18970>	2022-10-14 13:08:15 +00:00
Lionel Landwerlin	eec49374b0	nir: fix NIR_DEBUG=validate_ssa_dominance validate_ssa_def_dominance() asserts : validate_assert(state, !BITSET_TEST(state->ssa_defs_found, def->index)); Because the previous validation lefts bits set when it processed the IR. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18966>	2022-10-14 10:36:56 +03:00
Alyssa Rosenzweig	f4b03ea6dc	nir/lower_system_values: Fix cs_local_index_to_id with variable workgroups In that case we need to use the sysval. That sysval can be optimized anyway in the nonvariable case. Fixes test_basic.get_linear_ids on panfrost. Fixes: `998d84fca5` ("nir/lower_system_values: Support lowering more intrinsics") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18662>	2022-10-13 21:25:23 +00:00
Georg Lehmann	00a8be3414	nir: Print nir_selection_control_divergent_always_taken. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>	2022-10-11 15:42:54 +00:00
Timur Kristóf	c0d0a7c176	nir: Add selection control enum for always taken divergent branches. The new enum is called nir_selection_control_divergent_always_taken, and it's almost the same as nir_selection_control_flatten. The main difference between the two is that "flatten" represents a choice made by the application but "divergent_always_taken" may be applied by the compiler stack when it thinks this is beneficial. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>	2022-10-11 15:42:54 +00:00
Timur Kristóf	a2ec843727	nir: Document the flatten/dont_flatten selection control options. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>	2022-10-11 15:42:53 +00:00
Gert Wollny	9ebe893a61	nir_lower_to_source_mods: Don't sneek in an abs modifier from parent If the abs source modifiers is not supported for the current instruction because it is an instruction with three sources we may still see a parent mov that has the `abs` modifier. In this case we must not propagate that abs modifier from that parent instructions. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7350 Fixes: `cd73b6174b` nir/lower_to_source_mods: Stop turning add, sat, and neg into mov Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18902>	2022-10-04 08:36:57 +02:00
Emma Anholt	594b638d4f	nir/vars_to_ssa: Always do OOB load/store removal. We elminated OOB loads while renaming vars to SSA. However, if the OOB load only appeared after some other passes had constant folded, there may be no renaming work to do, at which point we'd leave the OOB load deref around without renaming it or deleting it. For vc4, this was quite a surprise and caused a regression when we stop eliminating some OOB accesses at the GLSL level. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>	2022-10-03 17:18:31 +00:00
Emma Anholt	6c38797101	nir/nir_opt_copy_prop_vars: Don't leak dynarray memory during the pass. It was swept at the end, but it meant that in shaders with lots of copies available at the start of lots of if statements, you'd blow up memory usage. turnip memory consumption on dEQP-VK.ssbo.layout.random.scalar.75 drops from 1.4GB to 110MB, and runtime from 19s to 17s. Fixes: #7361 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18891>	2022-10-03 15:33:21 +00:00
SoroushIMG	1e8e785a07	nir: allow to fine tune unrolling for loops with soft fp64 ops Lowered fp64 ops can blow up the loop bodies while still being suitable for unrolling. Allow for using different parameters to unroll loops with soft fp64. Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18863>	2022-09-30 17:07:37 +00:00
SoroushIMG	121f30005f	nir: track whether a loop contains soft fp64 ops Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18863>	2022-09-30 17:07:37 +00:00
Georg Lehmann	bfb12a3b6a	nir/opt_algebraic: Optimize more (a cmp b ? a : b) to min/max. Foz-DB Navi21: Totals from 112 (0.08% of 134913) affected shaders: CodeSize: 1618384 -> 1618172 (-0.01%); split: -0.06%, +0.04% Instrs: 307695 -> 307535 (-0.05%); split: -0.05%, +0.00% Latency: 3590228 -> 3589658 (-0.02%); split: -0.02%, +0.00% InvThroughput: 563692 -> 563447 (-0.04%); split: -0.05%, +0.01% Copies: 24541 -> 24519 (-0.09%); split: -0.10%, +0.01% Branches: 13480 -> 13468 (-0.09%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18548>	2022-09-30 11:10:52 +00:00
Kenneth Graunke	c9d399604e	st/mesa: Optionally call nir_vectorize_tess_levels() This lets us vectorize gl_TessLevel{Inner,Outer} writes, using a pass developed for RADV. Not all backends are prepared to handle this, so we make it optional. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>	2022-09-27 18:17:47 -07:00
Mike Blumenkrantz	52edd8f764	nir/opt_undef: add a pass to clean up 64bit undefs somehow 64bit lowering creates patterns like vec1 64 ssa_1 = undefined ssa_2 = unpack_64_2x32_split_x ssa_1 and then the 64bit value is never otherwise used. for this case, rewriting the unpack to just be a 32bit undef allows the 64bit undef to be optimized out, avoiding spec violations fixes #6945 SoroushIMG <soroush.kashani@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18728>	2022-09-27 18:38:25 +00:00
Timothy Arceri	40c32dfbb1	nir/loop_analyze: remove cost of redundant selects If we know that a select will be eliminated once the loop is unrolled than we don't need to count the instruction towards the cost of the loop. This change helps 2 loops unroll in an xcom enemy unknown shader that is loaded full of these redundant selects. Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18587>	2022-09-27 00:31:47 +00:00
Timothy Arceri	13d0ae593b	nir/loop_analyze: delay instruction cost calculation Here we move the calculation of the instruction cost of the loop after we have processed other information such as finding the induction variables. This is useful because we can use this further information to find instructions that will be eliminated if the loop was to unroll and therefore give them a cost of 0. Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18587>	2022-09-27 00:31:47 +00:00
Marcin Ślusarz	2a723f7a8d	nir: use nir_shader_instructions_pass in nir_split_per_member_structs Changes: - nir_metadata_preserve(..., nir_metadata_block_index \| nir_metadata_dominance) is called only when pass makes progress - nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't make progress Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12282>	2022-09-26 11:13:03 +00:00
Marcin Ślusarz	67fe9ae5c3	nir: use nir_shader_instructions_pass in nir_split_var_copies No functional changes. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12282>	2022-09-26 11:13:03 +00:00
Marcin Ślusarz	9dcff3ea53	nir: use nir_shader_instructions_pass in nir_lower_samplers No functional changes. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12282>	2022-09-26 11:13:03 +00:00
Marcin Ślusarz	865d959090	nir: use nir_shader_instructions_pass in nir_lower_interpolation Changes: - nir_metadata_preserve(..., nir_metadata_block_index \| nir_metadata_dominance) is called only when pass makes progress - nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't make progress Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12282>	2022-09-26 11:13:03 +00:00
Marcin Ślusarz	6e0bcc1c4d	nir: use nir_metadata_none instead of its value Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12282>	2022-09-26 11:13:03 +00:00

... 15 16 17 18 19 ...

4732 commits