fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-17 22:38:06 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	5717f13dff	nir/lower_shader_calls: add a pass to sort/pack values on the stack The previous pass shrinking values stored on the stack might have left some gaps on the stack (a vec4 turned into a vec3 for instance). This pass reorders variables on the stack, by component bit size and by ssa value number. The component size is useful to pack smaller values together. The ssa value number is also important because if we have 2 calls spilling the same values, then we can avoid reemiting the spillings if the values are stored in the same location. v2: Remove unused sorting function (Konstantin) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	4cd90ed7bc	nir/lower_shader_calls: add a pass to trim scratch values For example, if we store to scratch a vec4 but only a subset of components are used after the load operation. v2: Use nir_intrinsic_write_mask (Konstantin) Use u_foreach_bit() instead of u_bit_scan() (Konstantin) Fix mask building loop (Konstantin) v3: Fix reswizzle (Konstantin) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	1d10d17817	nir/lower_shader_calls: add an option structure for future optimizations Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	d0543bfbec	nir/lower_shader_calls: cleanup shaders a bit more post split Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	6d7e04d924	nir/lower_shader_calls: add NIR_PASS_V internally Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	dc70519c8a	nir/lower_shader_calls: rematerialize values in more complex cases Previously when considering whether to rematerialize or spill/fill ssa_1954, we would go for a spill/fill : vec4 32 ssa_388 = (float32)txf ssa_387 (texture_handle), ssa_86 (coord), ssa_23 (lod), 0 (texture), 0 (sampler) ... vec1 32 ssa_1953 = load_const (0xbd23d70a = -0.040000) vec1 32 ssa_1954 = fadd ssa_388.x, ssa_1953 vec1 32 ssa_1955 = fneg ssa_1954 This is because when looking at ssa_1955 the first time, we would consider ssa_388 unrematerialiable, and therefore all values built on top of it would be considered unrematerialiable as well. The missing piece when considering whether to rematerialize ssa_1954 is that we should look at filled values. Now that ssa_388 has been spilled/filled, we can rebuild ssa_1955 on top of the filled value and avoid spilling/filling ssa_1955 at all. This requires a bit more work though. We can't just look at an instruction in isolation, we need to go through the ssa chains until we find values we can rematerialize or not. In this change we build a list of all ssa values involved in building a given value, up to the point there we find a filled or a rematerializable value. In this particular case, looking at ssa_1955 : * We can rematerialize ssa_388 from its filled value * We can rematerialize ssa_1953 trivially * We can rematerialize ssa_1954 because its 2 inputs are rematerializable * We can rematerialize ssa_1955 because ssa_1954 is rematerializable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	ca2a1340a2	nir/lower_shader_calls: avoid respilling values Currently we do something like this : ssa_0 = ... ssa_1 = ... * spill ssa_0, ssa_1 call1() * fill ssa_0, ssa_1 ssa_2 = ... ssa_3 = ... * spill ssa_0, ssa_1, ssa_2, ssa_3 call2() * fill ssa_0, ssa_1, ssa_2, ssa_3 If we assign the same possition to ssa_0 & ssa_1 in the spilling stack, then on call2(), we know that those values are already present in memory at the right location and we can avoid respilling them. The result would be something like this : ssa_0 = ... ssa_1 = ... * spill ssa_0, ssa_1 call1() * fill ssa_0, ssa_1 ssa_2 = ... ssa_3 = ... * spill ssa_2, ssa_3 call2() * fill ssa_0, ssa_1, ssa_2, ssa_3 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	5a9f8d21d0	nir/lower_shader_calls: lower scratch access to format internally For a follow up optimization, we would like to track scratch loads. This isn't possible with global load/store intrinsics. So use a couple of special intrinsic in the pass and only lower it to global intrinsics at the end. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Lionel Landwerlin	df685b4f9c	nir/lower_shader_calls: rematerialize more trivial values Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16556>	2022-10-26 12:53:25 +00:00
Rhys Perry	382831c986	radv,nir: add intrinsics for streamout and GS copy shaders Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19302>	2022-10-25 17:35:08 +00:00
Qiang Yu	7fb506d068	nir: add nir_load_prim_xfb_query_enabled_amd Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>	2022-10-25 12:58:43 +00:00
Qiang Yu	a119a6464f	nir,ac,radv: add primitive count add intrinsics radeonsi use shader buffer, but radv use gds for the query result storage. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17457>	2022-10-25 12:58:43 +00:00
Qiang Yu	83643e4dc8	nir,ac/nir/ngg,radv: split shader_query_enabled_amd For used by different counter. Vulkan: 1. VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT, sum generated primitives of all 4 streams when GS. 2. VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, count generated primitives for all 4 streams when VS/TES/GS. 3. VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT, count generated and streamout primitives for all 4 streams when VS/TES/GS. OpenGL: 1. GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB, sum generated primitives for all 4 streams when GS. 2. GL_PRIMITIVES_GENERATED, count generated primitives for all 4 streams when VS/TES/GS. 3. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, count streamout primitives for all 4 streams when VS/TES/GS. pipeline_stat_query_enabled_amd is for Vulkan 1 and OpenGL 1. xfb_query_enabled_amd is for Vulkan 2/3 and OpenGL 2/3. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19015>	2022-10-25 02:42:52 +00:00
Rob Clark	a8e84f50bc	nir: Add helper to create passthrough TCS shader Based on si_create_passthrough_tcs() as that seemed the most generic of the various different backend driver implementations. Uses the load_tess_level_outer_default and load_tess_level_inner_default intrinsics to load the gl_TessLevelOuter and gl_TessLevelInner values, so driver will somehow need to implement those to load the values set by pipe_context::set_tess_state() or similar. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19259>	2022-10-24 21:39:38 +00:00
Dave Airlie	55d2b82cc0	glsl/types: fix dword slots calc for float16 matricies. The current uniform query uploader for mat3 calcs things as if the vector elements are f16vec4 wide, so fix the calcs here to do the same. Fixes GTF-GL46.gtf21.GL.mat3.mat3arraysimple_frag on llvmpipe when 16-bit uniform lowering is allowed. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14817>	2022-10-23 01:43:44 +00:00
Alyssa Rosenzweig	80de33cf6a	nir/opt_preamble: Move load_texture_base_agx nir_opt_preamble will be crucial to optimize out the lowering for array textures on AGX, which involves this AGX-specific sysval. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>	2022-10-22 14:48:04 -04:00
Georg Lehmann	741dbadae0	nir: Fix ifind_msb_rev constant folding. For example if src0 is 0x80000000 we should return 1, not 0. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `a5747f8ab3` ("nir: add opcodes for *find_msb_rev and lowering") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:55 +02:00
Georg Lehmann	125741dbae	nir/opt_algebraic: Optimize various find_msb_rev patterns. From dxvk, dxil-spirv, fxc, dxc and others. Totals from 177 (0.13% of 134913) affected shaders: CodeSize: 1079504 -> 1059872 (-1.82%) Instrs: 195381 -> 192269 (-1.59%) Latency: 3664137 -> 3631951 (-0.88%) InvThroughput: 599479 -> 585675 (-2.30%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:33 +02:00
Georg Lehmann	7505be3497	nir/opt_algebraic: Add an option to lower uclz. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:57:10 +02:00
Georg Lehmann	1e552b9c95	nir/opt_algebraic: Mirror optimizations for find_msb_rev. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18951>	2022-10-22 11:56:44 +02:00
Emma Anholt	22f7f167cd	nir/opt_phi_precision: Fix missing swizzles when narrowing phi srcs. This NIR: vec4 32 ssa_169 = phi block_1: ssa_168, block_2: ssa_138 vec1 16 ssa_209 = f2fmp ssa_169.x vec1 16 ssa_210 = f2fmp ssa_169.y vec1 16 ssa_211 = f2fmp ssa_169.z vec1 16 ssa_212 = f2fmp ssa_169.w vec4 16 ssa_213 = vec4 ssa_209, ssa_210, ssa_211, ssa_212 intrinsic store_output (ssa_213, ssa_171) (base=0, wrmask=xyzw /15/, component=0, src_type=float16 /144/, io location=4 slots=1 mediump /8388740/, xfb() /0/, xfb2() /0/) would turn into: vec4 32 ssa_169 = phi block_1: ssa_168, block_2: ssa_138 vec4 16 ssa_216 = phi block_1: ssa_214, block_2: ssa_215 vec1 16 ssa_209 = f2fmp ssa_169.x vec1 16 ssa_210 = f2fmp ssa_169.y vec1 16 ssa_211 = f2fmp ssa_169.z vec1 16 ssa_212 = f2fmp ssa_169.w vec4 16 ssa_213 = vec4 ssa_216.x, ssa_216.x, ssa_216.x, ssa_216.x intrinsic store_output (ssa_213, ssa_171) (base=0, wrmask=xyzw /15/, component=0, src_type=float16 /144/, io location=4 slots=1 mediump /8388740/, xfb() /0/, xfb2() /0/) ignoring the swizzles from the f2fmp srcs. Fixes failures in dEQP-GLES2.functional.shaders.random.all_features.fragment.20 on turnip+ANGLE. Fixes: `c7b935962b` ("nir: Add pass to lower phi precision") Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19179>	2022-10-22 03:06:31 +00:00
Timur Kristóf	e52c2f4fca	nir, ac, aco: Add index src to load_buffer_amd/store_buffer_amd. Also modify all existing uses to pass a zero to this new src. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>	2022-10-20 20:00:50 +00:00
Timur Kristóf	c918f0934e	nir, ac, aco: Add ACCESS intrinsic index to load/store_buffer_amd. Previously, we always treated these as coherent, but now let's make this configurable. Also set all current users to ACCESS_COHERENT. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> (nir) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17551>	2022-10-20 20:00:49 +00:00
Rhys Perry	1ae73bc076	nir/algebraic: optimize b<<a + c<<a fossil-db (navi21): Totals from 248 (0.18% of 135636) affected shaders: Instrs: 85836 -> 85611 (-0.26%); split: -0.27%, +0.00% CodeSize: 481304 -> 480332 (-0.20%); split: -0.21%, +0.00% Latency: 9596559 -> 9596152 (-0.00%); split: -0.00%, +0.00% InvThroughput: 1423707 -> 1423670 (-0.00%) SClause: 3872 -> 3874 (+0.05%) PreSGPRs: 5034 -> 5038 (+0.08%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19137>	2022-10-20 18:57:23 +00:00
Samuel Pitoiset	09033c7b22	nir: add nir_intrinsic_load_ring_attr_{offset}_amd These intrinsics will be used to lower NGG attributes to memory on GFX11. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19173>	2022-10-20 15:59:44 +00:00
Qiang Yu	58e006b174	nir,ac/llvm,radv: add nir_intrinsic_load_provoking_vtx_in_prim_amd For radeonsi which load this from arg. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19166>	2022-10-20 06:53:56 +00:00
Karol Herbst	d7156e5d9c	nir/lower_cl_images: set binding Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19151>	2022-10-19 06:27:20 +00:00
Timothy Arceri	b4b2fd0bb4	glsl: move lower instructions logic inside that pass There is now only a single called of this pass so tidy things up and move all this logic inside the pass. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19112>	2022-10-19 03:52:21 +00:00
Timothy Arceri	e5102a406f	glsl: always do {CARRY,BORROW}_TO_ARITH lowering The only caller always sets these so here we just remove the option to disable it. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19112>	2022-10-19 03:52:21 +00:00
Timothy Arceri	9f14c5dae2	glsl: drop sub to add neg lowering in GLSL IR NIR opt algebraic does this for us so no need to have it implemented here also. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19112>	2022-10-19 03:52:21 +00:00
Timothy Arceri	a31c547206	glsl: move rule inside lower_packing_builtins() We only have a single user of this pass so lets tidy things up and move all the rules in the pass itself. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19112>	2022-10-19 03:52:20 +00:00
Alyssa Rosenzweig	78adf44839	nir/lower_io: Set interpolated_input dest_type ...even for non-pixel interpolation, for consistency. Otherwise backends get funny intrinsics with interpolateAt: vec4 32 ssa_4 = intrinsic load_interpolated_input (ssa_3, ssa_2) (base=1, component=0, dest_type=invalid /0/, io location=33 slots=1 /161/) We know it'll be a float, but backends shouldn't need to special case this. (Or maybe interpolated_input shouldn't have a dest_type index. I'd be ok with that resolution too. But having one and not setting it consistently is wrong.) Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19085>	2022-10-18 21:08:54 +00:00
Yonggang Luo	a9da108c6b	nir: No need redefine snprintf anymore in nir.h Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18685>	2022-10-18 03:16:00 +00:00
Timothy Arceri	ac7f4e0942	glsl/glsl_to_nir: remove unreachable code This hack in glsl_to_nir() to clean up after the glsl ir linker should no longer be reachable. These type of linking opts are now done via a nir based linker long after GLSL IR has been coverted to nir by this pass. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19104>	2022-10-18 00:05:29 +00:00
Alyssa Rosenzweig	2c7be4d421	nir: Usher nir_normalize_cubemap_coords into 2022 I stumbled upon this old NIR pass (still in use by intel and broadcom) and noticed how most of the code was NIR boilerplate that we have helpers for. Rewrite the pass to use all the helpers. v2: Fix cube map arrays. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18754>	2022-10-17 20:46:24 +00:00
Alyssa Rosenzweig	fc5c671e87	nir: Fix nir_fmax_abs_vec_comp This failed to take fabs of the first component, implementing an unintended formula that would return the right results in some common cases but is wrong in general: max { x, \|y\|, \|z\| } instead of the intended max { \|x\|, \|y\|, \|z\| } Reexpress the implementation to make correctness obvious. Fixes: `272e927d0e` ("nir/spirv: initial handling of OpenCL.std extension opcodes") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18754>	2022-10-17 20:46:24 +00:00
Alyssa Rosenzweig	ac2964dfbd	nir: Be smarter fusing ffma If there is a single use of fmul, and that single use is fadd, it makes sense to fuse ffma, as we already do. However, if there are multiple uses, fusing may impede code gen. Consider the source fragment: a = fmul(x, y) b = fadd(a, z) c = fmin(a, t) d = fmax(b, c) The fmul has two uses. The current ffma fusing is greedy and will produce the following "optimized" code. a = fmul(x, y) b = ffma(x, y, z) c = fmin(a, t) d = fmax(b, c) Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1 fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of one multiply and an add. Depending on the ISA, that could impede scheduling or increase code size. It can also increase register pressure, extending the live range. It's tempting to gate on is_used_once, but that would hurt in cases where we really do fuse everything, e.g.: a = fmul(x, y) b = fadd(a, z) c = fadd(a, t) For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2 fadd. So what we really want is to fuse ffma iff the fmul will get deleted. That occurs iff all uses of the fmul are fadd and will themselves get fused to ffma, leaving fmul to get dead code eliminated. That's easy to implement with a new NIR search helper, checking that all uses are fadd. shader-db results on Mali-G57 [open shader-db + subset of closed]: total instructions in shared programs: 179491 -> 178991 (-0.28%) instructions in affected programs: 36862 -> 36362 (-1.36%) helped: 190 HURT: 27 total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%) cycles in affected programs: 72.02 -> 70.56 (-2.02%) helped: 28 HURT: 1 total fma in shared programs: 1590.47 -> 1582.61 (-0.49%) fma in affected programs: 319.95 -> 312.09 (-2.46%) helped: 194 HURT: 1 total cvt in shared programs: 812.98 -> 813.03 (<.01%) cvt in affected programs: 118.53 -> 118.58 (0.04%) helped: 65 HURT: 81 total quadwords in shared programs: 98968 -> 98840 (-0.13%) quadwords in affected programs: 2960 -> 2832 (-4.32%) helped: 20 HURT: 4 total threads in shared programs: 4693 -> 4697 (0.09%) threads in affected programs: 4 -> 8 (100.00%) helped: 4 HURT: 0 v2: Update trace checksums for virgl due to numerical differences. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>	2022-10-15 17:47:31 +00:00
Italo Nicola	66b3df3c15	clc: add 32-bit target Signed-off-by: Italo Nicola <italonicola@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18985>	2022-10-15 02:23:03 +00:00
Gert Wollny	2e50bf19cd	nir: move fusing csel and comparisons to opt_late_algebraic With that simple comparisons are cleaned up properly. This helps with some tesselation shaders on r600. Shader-db stats R600/Cayman: -------------------------------------------------------------- total dw in shared programs: 1621806 -> 1620884 (-0.06%) dw in affected programs: 41650 -> 40728 (-2.21%) helped: 211 HURT: 4 helped stats (abs) min: 2 max: 26 x̄: 4.46 x̃: 4 helped stats (rel) min: 0.30% max: 9.68% x̄: 2.87% x̃: 2.52% HURT stats (abs) min: 2 max: 8 x̄: 5.00 x̃: 5 HURT stats (rel) min: 0.23% max: 1.67% x̄: 1.02% x̃: 1.09% 95% mean confidence interval for dw value: -4.81 -3.77 95% mean confidence interval for dw %-change: -3.03% -2.57% Dw are helped. total gprs in shared programs: 41192 -> 41182 (-0.02%) gprs in affected programs: 731 -> 721 (-1.37%) helped: 53 HURT: 45 helped stats (abs) min: 1 max: 3 x̄: 1.23 x̃: 1 helped stats (rel) min: 5.88% max: 40.00% x̄: 16.56% x̃: 14.29% HURT stats (abs) min: 1 max: 2 x̄: 1.22 x̃: 1 HURT stats (rel) min: 7.69% max: 40.00% x̄: 19.42% x̃: 20.00% 95% mean confidence interval for gprs value: -0.37 0.16 95% mean confidence interval for gprs %-change: -3.92% 3.85% Inconclusive result (value mean confidence interval includes 0). total alu_groups in shared programs: 203677 -> 203632 (-0.02%) alu_groups in affected programs: 2876 -> 2831 (-1.56%) helped: 68 HURT: 30 helped stats (abs) min: 1 max: 4 x̄: 1.46 x̃: 1 helped stats (rel) min: 0.84% max: 25.00% x̄: 7.48% x̃: 5.41% HURT stats (abs) min: 1 max: 6 x̄: 1.80 x̃: 1 HURT stats (rel) min: 1.98% max: 33.33% x̄: 10.09% x̃: 5.61% 95% mean confidence interval for alu_groups value: -0.81 -0.11 95% mean confidence interval for alu_groups %-change: -4.20% <.01% Alu_groups are helped. total loops in shared programs: 72 -> 72 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cf in shared programs: 88230 -> 88233 (<.01%) cf in affected programs: 71 -> 74 (4.23%) helped: 1 HURT: 4 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.89% max: 33.33% x̄: 17.14% x̃: 16.67% 95% mean confidence interval for cf value: -0.51 1.71 95% mean confidence interval for cf %-change: -24.20% 38.29% Inconclusive result (value mean confidence interval includes 0). total stack in shared programs: 3827 -> 3827 (0.00%) stack in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 45.32 -> 41.69 (-8.01%) -------------------------------------------------------------- v2: Simplify replacement pattern (Rhys Perry) v3: fix ws (Alexander Orzechowski) v4: move the original lowering to opt_late_algebraic and drop cleanup code (Alyssa) v5: Add shader-sb stats (Alyssa) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18970>	2022-10-14 13:08:15 +00:00
Lionel Landwerlin	eec49374b0	nir: fix NIR_DEBUG=validate_ssa_dominance validate_ssa_def_dominance() asserts : validate_assert(state, !BITSET_TEST(state->ssa_defs_found, def->index)); Because the previous validation lefts bits set when it processed the IR. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18966>	2022-10-14 10:36:56 +03:00
Alyssa Rosenzweig	f4b03ea6dc	nir/lower_system_values: Fix cs_local_index_to_id with variable workgroups In that case we need to use the sysval. That sysval can be optimized anyway in the nonvariable case. Fixes test_basic.get_linear_ids on panfrost. Fixes: `998d84fca5` ("nir/lower_system_values: Support lowering more intrinsics") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18662>	2022-10-13 21:25:23 +00:00
Georg Lehmann	00a8be3414	nir: Print nir_selection_control_divergent_always_taken. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>	2022-10-11 15:42:54 +00:00
Timur Kristóf	c0d0a7c176	nir: Add selection control enum for always taken divergent branches. The new enum is called nir_selection_control_divergent_always_taken, and it's almost the same as nir_selection_control_flatten. The main difference between the two is that "flatten" represents a choice made by the application but "divergent_always_taken" may be applied by the compiler stack when it thinks this is beneficial. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>	2022-10-11 15:42:54 +00:00
Timur Kristóf	a2ec843727	nir: Document the flatten/dont_flatten selection control options. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>	2022-10-11 15:42:53 +00:00
Gert Wollny	9ebe893a61	nir_lower_to_source_mods: Don't sneek in an abs modifier from parent If the abs source modifiers is not supported for the current instruction because it is an instruction with three sources we may still see a parent mov that has the `abs` modifier. In this case we must not propagate that abs modifier from that parent instructions. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7350 Fixes: `cd73b6174b` nir/lower_to_source_mods: Stop turning add, sat, and neg into mov Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18902>	2022-10-04 08:36:57 +02:00
Emma Anholt	24607ce7d3	glsl: Remove lower_vec_index_to_swizzle. GLSL's lower_vector_derefs already does this, and even if it didn't nir_vector_extract() would when glsl-to-nir happens. No effect on freedreno shader-db. Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>	2022-10-03 17:18:31 +00:00
Emma Anholt	a2a6995352	glsl: remove opt_structure_splitting. nir_lower_vars_to_ssa will split temp structs up anyway. This fixes a bug where mediump wouldn't be propagated to the split vars. The effect is tiny, I think just shuffling some code scheduling from optimizing at different places. Affects Natural Selection 2, Serious Sam 3, 3dmark slingshot, and Lego Legacy. freedreno shader-db: total instructions in shared programs: 11315637 -> 11315993 (<.01%) instructions in affected programs: 24861 -> 25217 (1.43%) Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>	2022-10-03 17:18:31 +00:00
Emma Anholt	a19c0ce9b2	glsl: Remove opt_array_splitting. nir_lower_vars_to_ssa will split temp arrays up anyway. Fixes a bug where split arrays wouldn't get their precision qualifier. Helps mostly Android and skia shaders. Also affects Civ5, Witcher 2, and Borderlands 2. freedreno shader-db: total instructions in shared programs: 11319395 -> 11319355 (<.01%) instructions in affected programs: 65744 -> 65704 (-0.06%) Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>	2022-10-03 17:18:31 +00:00
Emma Anholt	f862f9112f	glsl: Remove do_set_program_inouts. No longer used since `214c774ba6` ("mesa/st: Remove st_glsl_to_tgsi."). Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>	2022-10-03 17:18:31 +00:00
Emma Anholt	e5248fb53e	glsl: Remove lower_output_reads. No longer used since `214c774ba6` ("mesa/st: Remove st_glsl_to_tgsi."). Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18466>	2022-10-03 17:18:31 +00:00

1 2 3 4 5 ...

7360 commits