fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-23 22:00:13 +01:00

Author	SHA1	Message	Date
Ian Romanick	0946108298	intel/fs: Simplify check in can_propagate_from The larger predicate here already requires that inst->opcode must be BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the other simplifications are pretty straight forward. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	1f15a0f8b2	intel/fs: Don't loop in try_constant_propagate The caller already loops over the sources. This means that the caller must loop over the sources in reverse because constant propagation prefers to propagate into the last sources first. The shader-db and fossil-db changes (below) are all due to SEL instructions. Changing the order sources are visited changes whether a SEL with two immediate sources is (+f0.0) sel g12 IMM_A IMM_B or (-f0.0) sel g12 IMM_B IMM_A The ordering of the sources affects the order the constant combining encounters the values, and the determines which value is "combined" and which value remains an immediate. This affects the results by luck. If there are two instructions: (+f0.0) sel g12 IMM_A IMM_B (+f0.0) sel g13 IMM_A IMM_C Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the selection algorithm in constant combining is greedy, this case requires the algorithm see the values in just the right order for the right thing to happen. v2: Rebase on many, many changes. Move instruction source fixup reordering out or try_constant_propagate. v3: Rebase on !7698. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	ab23d89ade	intel/fs: Move src.file checks out of try_constant_propagate and try_copy_propagate Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:23 +00:00
Ian Romanick	b5b2338c5c	intel/fs: Make try_constant_propagate and try_copy_propagate file private This annoyed me durning development of this MR. Every time I changed the parameters to this internal function, I had to modify a public header file... and trigger a much large rebuild. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:22 +00:00
Ian Romanick	8665e37960	intel/fs: Don't try to copy propagate into a source again after progress is made If the linked list structure used depended on the list head to know when to terminate, this would be a pretty serious bug. If try_constant_propage or try_copy_propagate make progress, inst->src[i].nr will change. This results in the foreach_in_list using a different list header on later iterations of the loop. This causes two shaders in shader-db and 9 shaders in fossil-db to change. Looking at the code changes, these are cases where there was a copy of a copy that gets propagated. The part that confuses me is the VGRF numbers involved should not hash to the same bucket, so it should be impossible to find the original source from the intermediate VGRF. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:22 +00:00
Ian Romanick	e488b46419	intel/fs: Don't continue fixed point iteration just because liveout changes Unless the change in liveout also causes livein to change, updates to liveout cannot have any global effect. Changes to livein already flag additional interation. I had additional changes in this area that didn't pan out. While working on those change, I was a little confused about this bit of code. It's unnecessary, so it's better to delete it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>	2023-09-14 22:31:22 +00:00
Caio Oliveira	3890c60584	compiler/types: Remove unused GLSL_TYPE_FUNCTION and related functions GLSL doesn't use that type. SPIR-V used for a while but later started relying on its own data structures and stopped using it. See `ca62e849d3` ("nir/spirv: Stop using glsl_type for function types") If we were ever to add this one again, would be better to have a way to grab a key for lookup that did not require allocations, right now that's needed to inject return type as the first element in params array. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25160>	2023-09-12 23:18:12 +00:00
Iván Briano	f1bc58cb7b	intel/fs: use ffsll so we don't explode on 32 bits Fixes: `b200e5765c` ("anv: use a simpler MUE layout for fast linked libraries") Tested-by: Mark Janes <markjanes@swizzler.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25192>	2023-09-12 22:42:38 +00:00
Iván Briano	4eddeea7bf	intel/fs: handle URB setup for fast linked mesh pipelines Up until now, the mesh pipeline assumed it would be always linked to the fragment shader, and so the calculated MUE map would always be available. That is not the case for fast linked pipeline libraries, so the URB setup needs to account for this. We do this by replicating what's done for non-mesh pipelines, defining the URB based on the FS inputs, and always assuming they will be laid out in order of varying number, except that we also account for per-primitive attributes. Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.* Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>	2023-09-12 02:51:31 +00:00
Iván Briano	17d7f7a292	intel/fs: read viewport and layer from the FS payload Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>	2023-09-12 02:51:31 +00:00
Iván Briano	d36da7c5f8	anv: track what kind of pipeline a fragment shader may be used with Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>	2023-09-12 02:51:31 +00:00
Iván Briano	b200e5765c	anv: use a simpler MUE layout for fast linked libraries The compaction introduced in `a252123363` ("intel/compiler/mesh: compactify MUE layout") is not suitable for the case where graphics pipeline libraries are fast linked, as the fragment shader won't receive the mue_map to know where to locate its inputs. For that case, keep doing what we did before and lay things down in the order varyings are defined, which is also how it works for the non-mesh case. Fixes dEQP-VK.fragment_shading_rate.fast_linked_library.ms Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>	2023-09-12 02:51:31 +00:00
Dave Airlie	bfe152916c	nir: move the libclc lowering over to functions file. This lowering is pretty generic, and I want to enhance it for times when we don't want to inline. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24687>	2023-09-12 01:57:50 +00:00
Ian Romanick	8ce4d7a08d	intel/compiler: Don't evict for workgroup-scope fences Flushing and invalidating caches isn't necessary for workgroup scope fences. In fact, the DP_FLUSH_TYPE docs (BSpec 54041) say: "If the fence scope is Local or Threadgroup, HW ignores the flush type and operates as if it was set to None(no flush)" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>	2023-09-09 04:41:25 +00:00
Ian Romanick	5eddf60e56	intel/compiler: Combine control barriers with identical memory semantics This prevents the second barrier generating a spurious, identical fence message as the first barrier. fossil-db stats on Alchemist: Totals: Instrs: 196513342 -> 196512777 (-0.00%); split: -0.00%, +0.00% Cycles: 14271426028 -> 14271404569 (-0.00%); split: -0.00%, +0.00% Send messages: 8021892 -> 8021770 (-0.00%) Totals from 46 (0.01% of 653252) affected shaders: Instrs: 76761 -> 76196 (-0.74%); split: -0.75%, +0.01% Cycles: 2027946 -> 2006487 (-1.06%); split: -1.45%, +0.39% Send messages: 7589 -> 7467 (-1.61%) Nothing in shader-db was affected. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>	2023-09-09 04:41:25 +00:00
Timothy Arceri	84e0f5ce75	nir: remove unused param from nir_alu_src_copy() Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24986>	2023-09-08 03:01:39 +00:00
Lionel Landwerlin	c9739e8912	intel/fs: limit register flag interaction of FIND_*LIVE_CHANNEL Those instructions do not access the flag registers on Gfx8+. Removing the interaction enables CSE to remove more of those instructions. Results are a bit mixed (DG2 vulkan fossils): ACO: Totals from 127 (5.97% of 2128) affected shaders: Instrs: 139966 -> 138972 (-0.71%); split: -0.85%, +0.14% Cycles: 1685747 -> 1667480 (-1.08%); split: -2.35%, +1.26% Max live registers: 10582 -> 10544 (-0.36%) Max dispatch width: 1048 -> 1040 (-0.76%) Cyberpunk 2077: Totals from 2879 (27.95% of 10301) affected shaders: Instrs: 4264789 -> 4225666 (-0.92%); split: -1.01%, +0.09% Cycles: 72380209 -> 71619521 (-1.05%); split: -1.63%, +0.58% Subgroup size: 30624 -> 30632 (+0.03%) Spill count: 98 -> 101 (+3.06%) Fill count: 90 -> 93 (+3.33%) Scratch Memory Size: 8192 -> 9216 (+12.50%) Max live registers: 217807 -> 217098 (-0.33%); split: -0.59%, +0.26% Max dispatch width: 23792 -> 24112 (+1.34%) Gaining 40 SIMD16 shaders Rise Of The Tomb Raider: Totals from 622 (5.06% of 12289) affected shaders: Instrs: 437380 -> 434760 (-0.60%); split: -0.72%, +0.12% Cycles: 261843085 -> 261580703 (-0.10%); split: -0.73%, +0.63% Max live registers: 27731 -> 27766 (+0.13%); split: -1.01%, +1.14% Max dispatch width: 5832 -> 5432 (-6.86%); split: +0.27%, -7.13% Loosing 26 SIMD32 shaders Strange Brigade: Totals from 1298 (31.48% of 4123) affected shaders: Instrs: 1504408 -> 1487968 (-1.09%); split: -1.17%, +0.08% Cycles: 20735976 -> 20443216 (-1.41%); split: -1.60%, +0.19% Max live registers: 89911 -> 89957 (+0.05%) DG2 shader-db run: total instructions in shared programs: 23130895 -> 23130036 (<.01%) instructions in affected programs: 260956 -> 260097 (-0.33%) helped: 234 HURT: 101 helped stats (abs) min: 1 max: 54 x̄: 6.36 x̃: 4 helped stats (rel) min: 0.05% max: 8.16% x̄: 2.01% x̃: 1.90% HURT stats (abs) min: 1 max: 37 x̄: 6.23 x̃: 3 HURT stats (rel) min: 0.02% max: 5.67% x̄: 0.89% x̃: 0.55% 95% mean confidence interval for instructions value: -3.62 -1.51 95% mean confidence interval for instructions %-change: -1.33% -0.94% Instructions are helped. total loops in shared programs: 6071 -> 6071 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 898610645 -> 898557166 (<.01%) cycles in affected programs: 18308201 -> 18254722 (-0.29%) helped: 315 HURT: 48 helped stats (abs) min: 1 max: 19312 x̄: 404.23 x̃: 128 helped stats (rel) min: 0.02% max: 28.98% x̄: 3.92% x̃: 2.65% HURT stats (abs) min: 2 max: 14478 x̄: 1538.60 x̃: 409 HURT stats (rel) min: <.01% max: 23.24% x̄: 3.34% x̃: 0.41% 95% mean confidence interval for cycles value: -333.68 39.03 95% mean confidence interval for cycles %-change: -3.51% -2.41% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 5964 -> 5964 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 6909 -> 6909 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 total sends in shared programs: 1040266 -> 1040266 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 3 GAINED: 1 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24553>	2023-09-06 14:47:40 +00:00
Lionel Landwerlin	10e75aae1b	intel/nir: rerun lower_tex if it lowers something nir_lower_tex can lower tg4 coords into tg4 offset which on DG2+ we also need to lower into constant offsets. Unfortunately the nir_lower_tex pass is not able to lower the instructions it itself generates, so the easy fix for when nir_lower_tex lowers tg4 coords into tg4 offsets is to rerun the pass. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9735 Cc: mesa-stable Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Tested-by: Yiwei Zhang <zzyiwei@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25015>	2023-09-05 13:35:51 +00:00
Matt Turner	28c1053c07	intel: Allow using intel_clc from the system With -Dintel-clc=system, the build system will search for an `intel_clc` binary and use it instead of building `intel_clc` itself. This allows Intel Vulkan ray tracing support to be built when cross compiling without terrible hacks (that would otherwise be necessary due to `intel_clc`'s dependence on SPIRV-LLVM-Translator, libclc, clang, and LLVM). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24983>	2023-09-01 21:36:02 +00:00
Alyssa Rosenzweig	f80c57c38f	treewide: Use nir_before/after_impl for more elaborate cases Via Coccinelle patch: @@ expression func_impl; @@ -nir_before_block(nir_start_block(func_impl)) +nir_before_impl(func_impl) @@ expression func_impl; @@ -nir_after_block(nir_impl_last_block(func_impl)) +nir_after_impl(func_impl) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24910>	2023-08-30 19:30:58 +00:00
Alyssa Rosenzweig	25cc04c59b	treewide: Use nir_before/after_impl in easy cases These open-code the same idiom as the helper. Via Coccinelle patch: @@ expression func_impl; @@ -nir_before_cf_list(&func_impl->body) +nir_before_impl(func_impl) @@ expression func_impl; @@ -nir_after_cf_list(&func_impl->body) +nir_after_impl(func_impl) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24910>	2023-08-30 19:30:58 +00:00
Karol Herbst	202fe3de31	intel/compiler: drop 64 bit handling for cl workgroup intrinsics Signed-off-by: Karol Herbst <git@karolherbst.de> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24905>	2023-08-30 07:04:33 +00:00
Lionel Landwerlin	74a40cc4b6	intel/fs: move lower of non-uniform at_sample barycentric to NIR We use a non-uniform lowering loop in the backend which we can do better in NIR because we can also use divergence analysis there. This change also limits VGRF usage to a single VGRF to hold the sample ID in the backend. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>	2023-08-29 23:19:13 +00:00
Lionel Landwerlin	68027bd38e	intel/fs: implement dynamic interpolation mode for dynamic persample shaders There is no restriction for query per sample positions from the interpolator when in non-per-sample dispatch mode. But apparently that's not giving us the expected values for fragment shaders compiled without per-sample dispatch knowledge (graphics pipeline libraries). So when per-sample dispatch is dynamic and we're doing at_sample interpolation, turn the interpolation back into at_offset at runtime when we detect that the fragment shader is not run per sample. Fixes a bunch of dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `d8dfd153c5` ("intel/fs: Make per-sample and coarse dispatch tri-state") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>	2023-08-29 23:19:13 +00:00
Lionel Landwerlin	9bf2a89127	intel/compiler: fix dynamic alpha-to-coverage handling Got the wrong logic operation. Let's reuse the nicer NIR builder helper. Fixes a bunch of KHR-GL46.sample_variables.mask.rgba8..samples.mask* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `fd7debc8bb` ("intel/fs: make alpha_to_coverage a tristate") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9568 Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>	2023-08-29 23:19:12 +00:00
Lionel Landwerlin	d74c301026	intel/compiler: disable per-sample interpolation modes with non-per-sample dispatch Fixes hangs in dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `5644011f06` ("intel/compiler: Convert wm_prog_key::persample_interp to a tri-state") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>	2023-08-29 23:19:12 +00:00
Ian Romanick	927a24db14	intel/fs: New VGRF packing scheme for constant combining Each block is processed separately. VGRF channels that are allocated to values that are only used in a particular block are made available in other blocks. This is almost always an improvement, but there are some pessimal cases where it goes horribly wrong. Imagine a shader with two blocks. In that shader, the first block has 5 constants used in the first block and the second block. Three other constants are only used in the first block. The second block has 15 constants that are used only in the block. The static VGRF usage is 3 regardless of packing. However, scheduling may be able to shorten the live range of the first VGRF when it only has values that came from the first block (because three of the values are dead on entry to the second block). This used to occurs in a Mad Max shader on Broadwell. That shader went from 0:0 spills:fills to 107:52. Some changes over the last year, I'm assuming !13734, have prevented this case from occuring. This change created a lot of churn on Haswell and Ivy Bridge. This seems to be primarily due to all the extra constants used for coissue, but I did not investigate very deeply. On older platforms, there were no changes to spills or fills. As a result, this is only used on Broadwell and newer platforms. v2: Update expected checksum for pixmark-piano-v2.trace on gl-zink-anv-tgl. See #9714 for more details. shader-db results: Tiger Lake total instructions in shared programs: 21101332 -> 21102084 (<.01%) instructions in affected programs: 863686 -> 864438 (0.09%) helped: 463 / HURT: 437 total cycles in shared programs: 790573225 -> 790664391 (0.01%) cycles in affected programs: 92546803 -> 92637969 (0.10%) helped: 558 / HURT: 629 total spills in shared programs: 3959 -> 3951 (-0.20%) spills in affected programs: 184 -> 176 (-4.35%) helped: 2 / HURT: 0 total fills in shared programs: 2639 -> 2631 (-0.30%) fills in affected programs: 184 -> 176 (-4.35%) helped: 2 / HURT: 0 LOST: 1 GAINED: 5 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 19945216 -> 19944711 (<.01%) instructions in affected programs: 139569 -> 139064 (-0.36%) helped: 66 / HURT: 3 total cycles in shared programs: 858410082 -> 857381323 (-0.12%) cycles in affected programs: 383825958 -> 382797199 (-0.27%) helped: 1012 / HURT: 1055 total spills in shared programs: 6190 -> 6116 (-1.20%) spills in affected programs: 891 -> 817 (-8.31%) helped: 66 / HURT: 3 total fills in shared programs: 7382 -> 7238 (-1.95%) fills in affected programs: 1538 -> 1394 (-9.36%) helped: 66 / HURT: 3 LOST: 5 GAINED: 8 Broadwell total instructions in shared programs: 17820886 -> 17812515 (-0.05%) instructions in affected programs: 800512 -> 792141 (-1.05%) helped: 385 / HURT: 1 total cycles in shared programs: 904482935 -> 903102070 (-0.15%) cycles in affected programs: 422427015 -> 421046150 (-0.33%) helped: 1091 / HURT: 812 total spills in shared programs: 17908 -> 16576 (-7.44%) spills in affected programs: 9459 -> 8127 (-14.08%) helped: 386 / HURT: 0 total fills in shared programs: 25397 -> 22354 (-11.98%) fills in affected programs: 15504 -> 12461 (-19.63%) helped: 385 / HURT: 1 LOST: 2 GAINED: 2 No shader-db changes on Haswell or older platforms. fossil-db results: Tiger Lake Instructions in all programs: 156881463 -> 156890970 (+0.0%) Instructions helped: 9033 Instructions hurt: 10285 Cycles in all programs: 7532597466 -> 7529647924 (-0.0%) Cycles helped: 10548 Cycles hurt: 13667 Spills in all programs: 5490 -> 5110 (-6.9%) Spills helped: 100 Spills hurt: 3 Fills in all programs: 6123 -> 5752 (-6.1%) Fills helped: 100 Fills hurt: 3 Gained: 17 Lost: 47 Ice Lake Instructions in all programs: 141309644 -> 141309603 (-0.0%) Instructions helped: 9 Instructions hurt: 4 Cycles in all programs: 9095812690 -> 9097008049 (+0.0%) Cycles helped: 14288 Cycles hurt: 16381 Spills in all programs: 7418 -> 7404 (-0.2%) Spills helped: 9 Spills hurt: 4 Fills in all programs: 8326 -> 8321 (-0.1%) Fills helped: 9 Fills hurt: 4 Skylake Instructions in all programs: 131872347 -> 131870690 (-0.0%) Instructions helped: 111 Instructions hurt: 3 Cycles in all programs: 8800835649 -> 8802483884 (+0.0%) Cycles helped: 9415 Cycles hurt: 9678 Spills in all programs: 6917 -> 6476 (-6.4%) Spills helped: 111 Spills hurt: 3 Fills in all programs: 7584 -> 7354 (-3.0%) Fills helped: 111 Fills hurt: 3 Lost: 5 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>	2023-08-29 19:01:37 +00:00
Ian Romanick	c506d7e511	intel/fs: Combine constants for integer instructions too v2: Remove type change for SHR with negation. This was a leftover from a previous attempt to deal with SHR and negation. Now all right-shifts with unsigned parameters are marked as not being able to have source modifiers. v3: Disallow negations on right shifts of unsigned sources by setting the no_negations flag in add_candidate_immediate. This eliminates the need to exclude SHR in can_do_source_mods. Tiger Lake total instructions in shared programs: 21102817 -> 21099443 (-0.02%) instructions in affected programs: 296796 -> 293422 (-1.14%) helped: 92 / HURT: 356 total cycles in shared programs: 790564691 -> 790393358 (-0.02%) cycles in affected programs: 36456886 -> 36285553 (-0.47%) helped: 171 / HURT: 286 total spills in shared programs: 3951 -> 3959 (0.20%) spills in affected programs: 176 -> 184 (4.55%) helped: 0 / HURT: 2 total fills in shared programs: 2631 -> 2639 (0.30%) fills in affected programs: 176 -> 184 (4.55%) helped: 0 / HURT: 2 LOST: 0 GAINED: 4 Ice Lake total instructions in shared programs: 19954204 -> 19949122 (-0.03%) instructions in affected programs: 40301 -> 35219 (-12.61%) helped: 23 / HURT: 2 total cycles in shared programs: 858377735 -> 858462082 (<.01%) cycles in affected programs: 75537286 -> 75621633 (0.11%) helped: 124 / HURT: 319 total spills in shared programs: 6255 -> 6190 (-1.04%) spills in affected programs: 392 -> 327 (-16.58%) helped: 1 / HURT: 2 total fills in shared programs: 7813 -> 7382 (-5.52%) fills in affected programs: 942 -> 511 (-45.75%) helped: 1 / HURT: 2 LOST: 0 GAINED: 3 Skylake total instructions in shared programs: 18049362 -> 18044440 (-0.03%) instructions in affected programs: 48317 -> 43395 (-10.19%) helped: 26 / HURT: 2 total cycles in shared programs: 844884806 -> 844915655 (<.01%) cycles in affected programs: 76137133 -> 76167982 (0.04%) helped: 171 / HURT: 293 total spills in shared programs: 6148 -> 6149 (0.02%) spills in affected programs: 595 -> 596 (0.17%) helped: 4 / HURT: 2 total fills in shared programs: 7484 -> 7067 (-5.57%) fills in affected programs: 1226 -> 809 (-34.01%) helped: 4 / HURT: 2 LOST: 0 GAINED: 8 Broadwell total instructions in shared programs: 17826844 -> 17821805 (-0.03%) instructions in affected programs: 60687 -> 55648 (-8.30%) helped: 28 / HURT: 8 total cycles in shared programs: 905332682 -> 904369499 (-0.11%) cycles in affected programs: 76743509 -> 75780326 (-1.26%) helped: 179 / HURT: 225 total spills in shared programs: 17922 -> 17908 (-0.08%) spills in affected programs: 2495 -> 2481 (-0.56%) helped: 6 / HURT: 8 total fills in shared programs: 26290 -> 25397 (-3.40%) fills in affected programs: 2606 -> 1713 (-34.27%) helped: 8 / HURT: 6 LOST: 1 GAINED: 1 Haswell total instructions in shared programs: 16678878 -> 16674444 (-0.03%) instructions in affected programs: 78458 -> 74024 (-5.65%) helped: 87 / HURT: 6 total cycles in shared programs: 880189381 -> 880301043 (0.01%) cycles in affected programs: 29956463 -> 30068125 (0.37%) helped: 169 / HURT: 163 total spills in shared programs: 14428 -> 14378 (-0.35%) spills in affected programs: 2384 -> 2334 (-2.10%) helped: 8 / HURT: 6 total fills in shared programs: 16975 -> 16881 (-0.55%) fills in affected programs: 1334 -> 1240 (-7.05%) helped: 10 / HURT: 4 Ivy Bridge total instructions in shared programs: 15706048 -> 15706035 (<.01%) instructions in affected programs: 9941 -> 9928 (-0.13%) helped: 13 / HURT: 0 total cycles in shared programs: 433618834 -> 433624637 (<.01%) cycles in affected programs: 12926714 -> 12932517 (0.04%) helped: 52 / HURT: 41 Sandy Bridge total cycles in shared programs: 741223552 -> 741223443 (<.01%) cycles in affected programs: 19814 -> 19705 (-0.55%) helped: 14 / HURT: 0 No changes on Iron Lake or GM45 fossil-db changes: Tiger Lake Instructions in all programs: 156858030 -> 156905532 (+0.0%) Instructions helped: 3915 Instructions hurt: 15411 Cycles in all programs: 7529667771 -> 7532117340 (+0.0%) Cycles helped: 10260 Cycles hurt: 9990 Spills in all programs: 5610 -> 5457 (-2.7%) Spills helped: 18 Fills in all programs: 6274 -> 6091 (-2.9%) Fills helped: 18 Gained: 2 Lost: 16 Ice Lake Instructions in all programs: 141308082 -> 141303083 (-0.0%) Instructions helped: 574 Instructions hurt: 172 Cycles in all programs: 9091361325 -> 9094622766 (+0.0%) Cycles helped: 8764 Cycles hurt: 11702 Spills in all programs: 7531 -> 7385 (-1.9%) Spills helped: 19 Fills in all programs: 8462 -> 8294 (-2.0%) Fills helped: 19 Gained: 22 Lost: 15 Skylake Instructions in all programs: 131872162 -> 131867263 (-0.0%) Instructions helped: 566 Instructions hurt: 172 Cycles in all programs: 8795095440 -> 8799676943 (+0.1%) Cycles helped: 8333 Cycles hurt: 12182 Spills in all programs: 7006 -> 6884 (-1.7%) Spills helped: 13 Fills in all programs: 7696 -> 7552 (-1.9%) Fills helped: 13 Gained: 24 Lost: 1 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>	2023-08-29 19:01:36 +00:00
Ian Romanick	64c251bb3a	intel/fs: Combine constants for SEL instructions too It is very common to have bcsel where the second and third sources are both constants. This results in a situation where we would want to emit a SEL with two constant sources, but that's not allowed. Previously, we would load both constants into registers, then let constant propagation copy the last constant into the SEL instruction. This results in the constant using an entire SIMD register instead of a single channel. Instead, copy propagate both sources, then let the combine-constants pass do its thing. In the worst case, this stores the constant in a single channel of the SIMD register. In the best case, it reuses a value that was loaded into a register to satisfy another instruction. shader-db results: Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 19951549 -> 19948709 (-0.01%) instructions in affected programs: 482795 -> 479955 (-0.59%) helped: 1184 / HURT: 3 total cycles in shared programs: 858584724 -> 858205341 (-0.04%) cycles in affected programs: 356168375 -> 355788992 (-0.11%) helped: 1448 / HURT: 1195 total spills in shared programs: 6569 -> 6255 (-4.78%) spills in affected programs: 912 -> 598 (-34.43%) helped: 58 / HURT: 0 total fills in shared programs: 8218 -> 7813 (-4.93%) fills in affected programs: 1570 -> 1165 (-25.80%) helped: 58 / HURT: 0 LOST: 6 GAINED: 16 Broadwell total instructions in shared programs: 17819660 -> 17819389 (<.01%) instructions in affected programs: 1078129 -> 1077858 (-0.03%) helped: 1067 / HURT: 304 total cycles in shared programs: 904722624 -> 905035016 (0.03%) cycles in affected programs: 362583117 -> 362895509 (0.09%) helped: 1381 / HURT: 1123 total spills in shared programs: 17884 -> 17922 (0.21%) spills in affected programs: 5088 -> 5126 (0.75%) helped: 55 / HURT: 152 total fills in shared programs: 25533 -> 26290 (2.96%) fills in affected programs: 12992 -> 13749 (5.83%) helped: 61 /HURT: 295 LOST: 7 GAINED: 24 Haswell total instructions in shared programs: 16678080 -> 16673976 (-0.02%) instructions in affected programs: 1162893 -> 1158789 (-0.35%) helped: 1584 / HURT: 7 total cycles in shared programs: 880180082 -> 879932525 (-0.03%) cycles in affected programs: 364067522 -> 363819965 (-0.07%) helped: 1226 / HURT: 976 total spills in shared programs: 14937 -> 14428 (-3.41%) spills in affected programs: 7866 -> 7357 (-6.47%) helped: 351 / HURT: 5 total fills in shared programs: 17572 -> 16975 (-3.40%) fills in affected programs: 11028 -> 10431 (-5.41%) helped: 350 / HURT: 3 LOST: 8 GAINED: 16 Ivy Bridge total instructions in shared programs: 15704044 -> 15703158 (<.01%) instructions in affected programs: 304513 -> 303627 (-0.29%) helped: 707 / HURT: 0 total cycles in shared programs: 433560149 -> 433471118 (-0.02%) cycles in affected programs: 19299650 -> 19210619 (-0.46%) helped: 687 / HURT: 395 LOST: 2 GAINED: 9 Sandy Bridge total instructions in shared programs: 13913386 -> 13912884 (<.01%) instructions in affected programs: 195687 -> 195185 (-0.26%) helped: 455 / HURT: 0 total cycles in shared programs: 741156272 -> 741136266 (<.01%) cycles in affected programs: 10934349 -> 10914343 (-0.18%) helped: 578 / HURT: 289 LOST: 9 GAINED: 4 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8364056 -> 8364042 (<.01%) instructions in affected programs: 5178 -> 5164 (-0.27%) helped: 10 / HURT: 0 total cycles in shared programs: 248759794 -> 248757940 (<.01%) cycles in affected programs: 4305246 -> 4303392 (-0.04%) helped: 183 / HURT: 24 fossil-db results: Tiger Lake Instructions in all programs: 156943594 -> 156802601 (-0.1%) Instructions helped: 20595 Instructions hurt: 23248 Cycles in all programs: 7512086950 -> 7528386387 (+0.2%) Cycles helped: 29531 Cycles hurt: 27837 Spills in all programs: 13500 -> 5643 (-58.2%) Spills helped: 394 Spills hurt: 22 Fills in all programs: 18943 -> 6306 (-66.7%) Fills helped: 394 Fills hurt: 11 Gained: 93 Lost: 76 Ice Lake Instructions in all programs: 141395899 -> 141249621 (-0.1%) Instructions helped: 30067 Instructions hurt: 3 Cycles in all programs: 9097127057 -> 9089668235 (-0.1%) Cycles helped: 32268 Cycles hurt: 24315 Spills in all programs: 13695 -> 7564 (-44.8%) Spills helped: 403 Fills in all programs: 18400 -> 8494 (-53.8%) Fills helped: 403 Gained: 114 Lost: 137 Skylake Instructions in all programs: 131948328 -> 131826063 (-0.1%) Instructions helped: 29968 Instructions hurt: 3 Cycles in all programs: 8794778440 -> 8793934844 (-0.0%) Cycles helped: 32705 Cycles hurt: 23575 Spills in all programs: 10526 -> 7039 (-33.1%) Spills helped: 403 Fills in all programs: 11025 -> 7728 (-29.9%) Fills helped: 403 Gained: 102 Lost: 250 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>	2023-08-29 19:01:36 +00:00
Ian Romanick	44d62a5224	intel/fs: Completely re-write the combine constants pass The is a squash of what in the original MR was "util: Add generic pass that tries to combine constants" and "intel/fs: Switch to using util_combine_constants". The new algorithm uses a multi-pass greedy algorithm that attempts to collect constants for loading in order of increasing degrees of freedom. The first pass collects constants that must be emitted as-is (e.g., without source modifiers). The second pass emits all constants that must be emitted (because they are used in a source field that cannot be a literal constant) but that can have a source modifier. The final pass possibly emits constants that may not have to be emitted. This is used for instructions where one of the fields is allowed to be a constant. This is not used in the current commit, but future commits that enable SEL will use this. The SEL instruction can have a single constant, but when both sources are constant, one of the sources has to be loaded into a register. By loading constants in this order, required "choices" made in earlier passes may be re-used in later passes. This provides a more optimal result. At this point in the series, most platforms have the same results with the new implementation. Gen7 platforms see a significant number of "small" changes. Due to the coissue optimization on Gen7, each shader is likely to have most constants affected by constant combining. If a shader has only a single basic block, constants are packed into registers in the order produced by the constant combining process. Since each constant has a different live range in the shader, even slightly different packing orders can have dramatic effects on the live range of a register. Even in cases where this does not affect register pressure in a meaningful way, it can cause the scheduler to make very different choices about the ordering of instructions. From my analysis (using the `if (debug) { ... }` block at the end of fs_visitor::opt_combine_constants), the old implementation and the new implementation pick the same set of constants, but the order produced may be slightly different. For the smaller number of values in non-Gfx7 shaders, the orders are similar enough to not matter. No shader-db or fossil-db changes on any non-Gfx7 platforms. Haswell and Ivy Bridge had similar results. (Haswell shown) total cycles in shared programs: 879930036 -> 880001666 (<.01%) cycles in affected programs: 22485040 -> 22556670 (0.32%) helped: 1879 HURT: 2309 helped stats (abs) min: 1 max: 6296 x̄: 258.54 x̃: 34 helped stats (rel) min: <.01% max: 54.63% x̄: 3.88% x̃: 0.87% HURT stats (abs) min: 1 max: 9739 x̄: 241.41 x̃: 40 HURT stats (rel) min: <.01% max: 160.50% x̄: 6.01% x̃: 0.99% 95% mean confidence interval for cycles value: -1.04 35.25 95% mean confidence interval for cycles %-change: 1.23% 1.92% Inconclusive result (value mean confidence interval includes 0). LOST: 82 GAINED: 39 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698>	2023-08-29 19:01:36 +00:00
Alyssa Rosenzweig	cda1961835	treewide: Also handle struct nir_builder form Via Coccinelle patch: @def@ typedef bool; typedef nir_builder; typedef nir_instr; typedef nir_def; identifier fn, instr, intr, x, builder, data; @@ static fn(struct nir_builder* builder, -nir_instr instr, +nir_intrinsic_instr intr, ...) { ( - if (instr->type != nir_instr_type_intrinsic) - return false; - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); \| - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); - if (instr->type != nir_instr_type_intrinsic) - return false; ) <... ( -instr->x +intr->instr.x \| -instr +&intr->instr ) ...> } @pass depends on def@ identifier def.fn; expression shader, progress; @@ ( -nir_shader_instructions_pass(shader, fn, +nir_shader_intrinsics_pass(shader, fn, ...) \| -NIR_PASS_V(shader, nir_shader_instructions_pass, fn, +NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn, ...) \| -NIR_PASS(progress, shader, nir_shader_instructions_pass, fn, +NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn, ...) ) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852>	2023-08-24 15:48:02 +00:00
Alyssa Rosenzweig	465b138f01	treewide: Use nir_shader_intrinsic_pass sometimes This converts a lot of trivial passes. Nice boilerplate deletion. Via Coccinelle patch (with a small manual fix-up for panfrost where coccinelle got confused by genxml + ninja clang-format squashed in, and for Zink because my semantic patch was slightly buggy). @def@ typedef bool; typedef nir_builder; typedef nir_instr; typedef nir_def; identifier fn, instr, intr, x, builder, data; @@ static fn(nir_builder* builder, -nir_instr instr, +nir_intrinsic_instr intr, ...) { ( - if (instr->type != nir_instr_type_intrinsic) - return false; - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); \| - nir_intrinsic_instr intr = nir_instr_as_intrinsic(instr); - if (instr->type != nir_instr_type_intrinsic) - return false; ) <... ( -instr->x +intr->instr.x \| -instr +&intr->instr ) ...> } @pass depends on def@ identifier def.fn; expression shader, progress; @@ ( -nir_shader_instructions_pass(shader, fn, +nir_shader_intrinsics_pass(shader, fn, ...) \| -NIR_PASS_V(shader, nir_shader_instructions_pass, fn, +NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn, ...) \| -NIR_PASS(progress, shader, nir_shader_instructions_pass, fn, +NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn, ...) ) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852>	2023-08-24 15:48:02 +00:00
Yonggang Luo	0b84e38684	intel/brw: use 4 instead of MAX_VERTEX_STREAMS to avoid #include "mesa/main/config.h" Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24824>	2023-08-24 02:54:08 +00:00
Kenneth Graunke	08fc4603dd	intel/fs: Dump IR for pre-RA scheduler modes in DEBUG_OPTIMIZER This lets us more easily compare and contrast the various scheduling options that the compiler considered. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	07f2ad32e4	intel/fs: Pick the lowest register pressure schedule when spilling We try various pre-RA scheduler modes and see if any of them allow us to register allocate without spilling. If all of them spill, however, we left it on the last mode: LIFO. This is unfortunately sometimes significantly worse than other modes (such as "none"). This patch makes us instead select the pre-RA scheduling mode that gives the lowest register pressure estimate, if none of them manage to avoid spilling. The hope is that this scheduling will spill the least out of all of them. fossil-db stats (on Alchemist) speak for themselves: Totals: Instrs: 197297092 -> 195326552 (-1.00%); split: -1.02%, +0.03% Cycles: 14291286956 -> 14303502596 (+0.09%); split: -0.55%, +0.64% Spill count: 190886 -> 129204 (-32.31%); split: -33.01%, +0.70% Fill count: 361408 -> 225038 (-37.73%); split: -39.17%, +1.43% Scratch Memory Size: 12935168 -> 10868736 (-15.98%); split: -16.08%, +0.10% Totals from 1791 (0.27% of 668386) affected shaders: Instrs: 7628929 -> 5658389 (-25.83%); split: -26.50%, +0.67% Cycles: 719326691 -> 731542331 (+1.70%); split: -10.95%, +12.65% Spill count: 110627 -> 48945 (-55.76%); split: -56.96%, +1.20% Fill count: 221560 -> 85190 (-61.55%); split: -63.89%, +2.34% Scratch Memory Size: 4471808 -> 2405376 (-46.21%); split: -46.51%, +0.30% Improves performance when using XeSS in Cyberpunk 2077 by 90% on A770. Improves performance of Borderlands 3 by 1.54% on A770. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	158ac265df	intel/fs: Make helpers for saving/restoring instruction order This moves a bit of code out of a large function, but also lets us reuse it a few extra places in the next commit. I opted to stop using ralloc here since this is short-lived data that doesn't need to stick around for the rest of the compile, and it's easy enough to free. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	2dd56921c9	intel/fs: Index scheduler mode string table by mode enum pre_modes[] is an array with the modes ordered in our desired preference. scheduler_mode_name[] was also in that order, and the two had to be kept in sync. This is a little silly; we should just have a mode enum -> string table and look it up via the enum. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	7eba19245d	intel/compiler: Move SCHEDULE_NONE handling into schedule_instructions() I'm going to introduce another call site for this function, and just handling SCHEDULE_NONE in the scheduler itself makes more sense than duplicating the logic. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Kenneth Graunke	743fd60bea	intel/fs: Account for payload GRFs when calculating register pressure The register pressure analysis I wrote in 2013 only considered VGRFs, and not other GRFs, such as payload registers and push constants. We need to consider those too, because payload registers definitely occupy space and add to pressure. In 2015, Connor already made the scheduler account for this, so the only real use for this is in shader statistic dumps and optimizer printouts. But we should make it more accurate. (We will use it in more places shortly, a few commits from now.) Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>	2023-08-23 21:34:38 +00:00
Emma Anholt	5bd0750921	intel/fs: Simplify compute_start_end(). Now that we have moved the screening up, we can simplify the code. No change in shader-db steam performance, n=10. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24702>	2023-08-22 23:34:30 +00:00
Emma Anholt	2b01246f49	intel/fs: Move the defin[]/defout[] screening up to livein[]/liveout[] setup. This keeps us from having to run the loop to propagate up quite so much. steam shader-db time -1.86356% +/- 0.941498% (n=10). There's a small scheduling effect, since previously the scheduler wasn't considering defin/defout: cycles helped: shaders/closed/steam/amnesia-the-dark-descent/high/241.shader_test FS SIMD16: 11428 -> 11422 (-0.05%) (scheduled: scheduled) cycles helped: shaders/humus-volumetricfogging2/1.shader_test FS SIMD32: 13832 -> 13800 (-0.23%) (scheduled: scheduled) cycles helped: shaders/tesseract/479.shader_test FS SIMD32: 9330 -> 8644 (-7.35%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/android/angle/aztec_ruins/36.shader_test FS SIMD32: 7870 -> 7940 (0.89%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/gfxbench5/gl_5_high_off/57.shader_test FS SIMD32: 7870 -> 7940 (0.89%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/gfxbench5/gl_5_normal_off/54.shader_test FS SIMD32: 7870 -> 7940 (0.89%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/android/angle/aztec_ruins/30.shader_test FS SIMD32: 8726 -> 8808 (0.94%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/gfxbench5/gl_5_high_off/51.shader_test FS SIMD32: 8726 -> 8808 (0.94%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/gfxbench5/gl_5_normal_off/48.shader_test FS SIMD32: 8726 -> 8808 (0.94%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/gfxbench5/gl_4_off/129.shader_test TCS SIMD8: 3911 -> 3979 (1.74%) (scheduled: scheduled) cycles HURT: shaders/robclark-shaders/gfxbench5/gl_4_off/109.shader_test TCS SIMD8: 3911 -> 3979 (1.74%) (scheduled: scheduled) total cycles in shared programs: 313096438 -> 313096306 (<.01%) cycles in affected programs: 92200 -> 92068 (-0.14%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24702>	2023-08-22 23:34:30 +00:00
Emma Anholt	ed4e1becea	intel/fs: Move defin/defout setup to the start of the loop. Refactor for the next commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24702>	2023-08-22 23:34:30 +00:00
Georg Lehmann	9cf6984200	nir: unify lower_find_msb with has_{find_msb_rev,uclz} Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Georg Lehmann	2ac7e6614a	nir: unify lower_bitfield_extract with has_bfe Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Georg Lehmann	34c3f81614	nir: unify lower_bitfield_insert with has_{bfm,bfi,bitfield_select} Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24662>	2023-08-22 12:08:37 +00:00
Faith Ekstrand	b5d6b7c402	nir: Drop most uses if nir_instr_rewrite_src() Generated by the following semantic patch: @@ expression I, S, D; @@ -nir_instr_rewrite_src(I, S, nir_src_for_ssa(D)); +nir_src_rewrite(S, D); Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24729>	2023-08-18 01:00:15 +00:00
Faith Ekstrand	de063a1481	nir: Drop most uses of nir_instr_rewrite_src_ssa() Generated with the following semantic patch: @@ expression I, S, D; @@ -nir_instr_rewrite_src_ssa(I, S, D); +nir_src_rewrite(S, D); Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24729>	2023-08-18 01:00:15 +00:00
Kenneth Graunke	d7daf78f62	intel/compiler: Respect NIR_DEBUG_PRINT_INTERNAL for DEBUG_OPTIMIZER If the NIR_DEBUG_PRINT_INTERNAL flag is not set, don't print debugging information for internal shaders in INTEL_DEBUG=optimizer dumps. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24684>	2023-08-17 18:19:53 +00:00
Matt Turner	d142c845d0	Revert "intel/fs: only avoid SIMD32 if strictly inferior in throughput" This reverts commit `6b494745be`. The logic is not entirely correct: the comparison is between two static-analysis estimates of a dynamic system with variables that aren't captured by the shader source, so using ">" will always have greater potential to cause regressions whenever the performance difference between the two builds is something not captured by the static model, no matter how much the model is improved. Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9262 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24615>	2023-08-16 14:56:15 +00:00
Faith Ekstrand	43be4129d2	nir: s/live_ssa_def/live_def/ Generated mostly with sed: sed -i -e 's/live_ssa_def/live_def/g' src/compiler/nir/nir.h src/compiler/nir/*.c Plus three fixups in various Intel drivers. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24703>	2023-08-15 17:44:27 +00:00

... 20 21 22 23 24 ...

3754 commits