fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Rhys Perry	1e9f72ffbe	radv,aco: use lower_to_fragment_fetch This simplifies ACO and will let us optimize the FMASK fetch (for example, move it out of loops). fossil-db (Sienna Cichlid): Totals from 955 (0.64% of 150170) affected shaders: CodeSize: 4722016 -> 4722952 (+0.02%); split: -0.02%, +0.04% Instrs: 875619 -> 875760 (+0.02%); split: -0.02%, +0.04% Latency: 14069089 -> 14071699 (+0.02%); split: -0.02%, +0.04% InvThroughput: 2321419 -> 2321218 (-0.01%); split: -0.02%, +0.01% VClause: 23080 -> 23081 (+0.00%) SClause: 32426 -> 32019 (-1.26%); split: -1.88%, +0.62% Copies: 42787 -> 42777 (-0.02%); split: -0.19%, +0.16% Branches: 17900 -> 17902 (+0.01%); split: -0.04%, +0.06% PreSGPRs: 43229 -> 41002 (-5.15%); split: -5.16%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>	2021-10-07 15:36:39 +00:00
Rhys Perry	cfb816b2a5	aco: use correct dim for FMASK fetches I think it somehow worked fine previously, but this is more correct. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>	2021-10-07 15:36:39 +00:00
Rhys Perry	bf0cc05227	aco: return 0x76543210 for NULL FMASK fetch This can replace several v_cndmask_b32 with a single v_cndmask_b32, and will be useful when we lower sample index adjustment in NIR. fossil-db (Sienna Cichlid): Totals from 955 (0.64% of 150170) affected shaders: VGPRs: 53232 -> 53208 (-0.05%) CodeSize: 4712548 -> 4722016 (+0.20%); split: -0.02%, +0.23% MaxWaves: 19052 -> 19056 (+0.02%) Instrs: 875891 -> 875619 (-0.03%); split: -0.04%, +0.00% Latency: 14070164 -> 14069089 (-0.01%); split: -0.02%, +0.01% InvThroughput: 2322982 -> 2321419 (-0.07%); split: -0.08%, +0.01% VClause: 23070 -> 23080 (+0.04%); split: -0.00%, +0.05% SClause: 32463 -> 32426 (-0.11%); split: -0.12%, +0.01% Copies: 42840 -> 42787 (-0.12%); split: -0.19%, +0.07% Branches: 17907 -> 17900 (-0.04%); split: -0.06%, +0.02% PreSGPRs: 43585 -> 43229 (-0.82%) PreVGPRs: 47676 -> 47625 (-0.11%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>	2021-10-07 15:36:39 +00:00
Rhys Perry	225fe37c14	nir: add _amd suffix to fragment_mask_fetch and fragment_fetch texops Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>	2021-10-07 15:36:39 +00:00
Timur Kristóf	6ca66808b5	aco: Fix determining whether any culling is enabled. Use 0xB instead of 0x00FFFFFF - this allows to jump over the culling code when no actual culling is enabled but the ngg_cull_face_is_ccw flag is set. Fixes: `182d9b1e60` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13129>	2021-10-01 15:46:12 +00:00
Timur Kristóf	c13a8d20f7	aco: Fix small primitive precision. This is a mistake. It should use ngg_culling_settings instead of ngg_gs_state. Fixes: `182d9b1e60` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13129>	2021-10-01 15:46:12 +00:00
Tony Wasserka	0812d440c7	aco: Use std::vector for the underlying container of std::stack By default, std::stack uses std::deque to allocate its elements, which has poor cache efficiency. std::vector makes appending elements more expensive (due to potential reallocations), but in the changed contexts the element count should always be low anyway. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11925>	2021-10-01 09:39:13 +00:00
Timur Kristóf	5c35040da1	aco: Don't write m0 register for LDS instructions on GFX9+. Fossil DB stats on Sienna Cichlid: Totals from 2691 (2.09% of 128647) affected shaders: VGPRs: 124392 -> 124376 (-0.01%) CodeSize: 8192352 -> 8174620 (-0.22%); split: -0.22%, +0.00% MaxWaves: 61516 -> 61524 (+0.01%) Instrs: 1519774 -> 1514958 (-0.32%); split: -0.32%, +0.00% Latency: 14767555 -> 14766145 (-0.01%); split: -0.01%, +0.00% InvThroughput: 3394282 -> 3394173 (-0.00%); split: -0.01%, +0.00% VClause: 31985 -> 32002 (+0.05%); split: -0.02%, +0.07% SClause: 47581 -> 47539 (-0.09%); split: -0.14%, +0.05% Copies: 127533 -> 122709 (-3.78%); split: -3.80%, +0.02% Branches: 39395 -> 39390 (-0.01%) PreSGPRs: 84389 -> 82702 (-2.00%) PreVGPRs: 87520 -> 87519 (-0.00%) Fossil DB stats on Sienna Cichlid with NGGC on: Totals from 60930 (47.36% of 128647) affected shaders: VGPRs: 2180712 -> 2180696 (-0.00%) CodeSize: 169122736 -> 167474304 (-0.97%); split: -0.97%, +0.00% MaxWaves: 1703698 -> 1703706 (+0.00%) Instrs: 32301234 -> 31888743 (-1.28%); split: -1.28%, +0.00% Latency: 152526083 -> 152367301 (-0.10%); split: -0.10%, +0.00% InvThroughput: 25090218 -> 25089812 (-0.00%); split: -0.00%, +0.00% VClause: 577302 -> 577319 (+0.00%); split: -0.00%, +0.00% SClause: 801614 -> 801572 (-0.01%); split: -0.01%, +0.00% Copies: 3399700 -> 2987201 (-12.13%); split: -12.13%, +0.00% Branches: 1262859 -> 1262854 (-0.00%) PreSGPRs: 2175752 -> 2141331 (-1.58%) PreVGPRs: 1785088 -> 1785087 (-0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11224>	2021-09-29 16:00:19 +02:00
Daniel Schürmann	40a93e271c	aco: clang-format No changes, just formatting. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13087>	2021-09-28 19:48:00 +00:00
Timur Kristóf	9478901824	aco: Implement integer conversions using p_extract. Fossil DB stats on Sienna Cichlid: Totals from 563 (0.44% of 128647) affected shaders: SpillSGPRs: 1381 -> 1382 (+0.07%) SpillVGPRs: 1606 -> 1552 (-3.36%) CodeSize: 2474724 -> 2446612 (-1.14%); split: -1.15%, +0.02% Scratch: 181248 -> 180224 (-0.56%) Instrs: 440973 -> 435091 (-1.33%); split: -1.35%, +0.01% Latency: 9123609 -> 8517830 (-6.64%); split: -6.66%, +0.02% InvThroughput: 3685256 -> 3383293 (-8.19%); split: -8.22%, +0.02% VClause: 8425 -> 8372 (-0.63%) Copies: 66553 -> 66681 (+0.19%); split: -0.49%, +0.68% Branches: 13824 -> 13825 (+0.01%); split: -0.01%, +0.01% PreSGPRs: 21816 -> 21824 (+0.04%) Fossil DB stats on Sienna Cichlid with NGGC on: Totals from 58802 (45.71% of 128647) affected shaders: SpillSGPRs: 6541 -> 6542 (+0.02%) SpillVGPRs: 1606 -> 1552 (-3.36%) CodeSize: 162976608 -> 162244340 (-0.45%); split: -0.45%, +0.00% Scratch: 181248 -> 180224 (-0.56%) Instrs: 31163521 -> 31098078 (-0.21%); split: -0.21%, +0.00% Latency: 146893569 -> 144920070 (-1.34%); split: -1.34%, +0.00% InvThroughput: 25384324 -> 25035940 (-1.37%); split: -1.38%, +0.00% VClause: 552310 -> 552257 (-0.01%) Copies: 3356856 -> 3356984 (+0.00%); split: -0.01%, +0.01% Branches: 1237314 -> 1237315 (+0.00%); split: -0.00%, +0.00% PreSGPRs: 2185339 -> 2185347 (+0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11560>	2021-09-28 17:59:27 +00:00
Samuel Pitoiset	deede6b03d	radv: pass the pipeline key to the backend compilers It exactly matches the shader keys now. Everything was copied from the pipeline key to the shader keys. There is still some work to completely remove radv_shader_variant_key. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13032>	2021-09-27 11:57:25 +02:00
Bas Nieuwenhuizen	8ca54b4d38	radv: Support nir_intrinsic_load_global_constant. SPIR-V parsing can result in some direct constant usage for shader records. Lower this early to a global based intrinsic so that it doesn't interfere with the later 32-bit offset based constants for scratch usage. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>	2021-09-21 01:53:39 +00:00
Bas Nieuwenhuizen	c299968988	aco: Add support for ray launch size. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>	2021-09-21 01:53:39 +00:00
Bas Nieuwenhuizen	817553c052	aco: Implement call scope. Since we do no repacking yet, just use invocation. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>	2021-09-21 01:53:39 +00:00
Bas Nieuwenhuizen	b6be96a2bd	radv: Modify load_sbt_amd intrinsic to get the descriptor. That way we can get the address to the entry, which is needed for some nir builtins because extra data in the entry can be used as shader input. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>	2021-09-21 01:53:39 +00:00
Timur Kristóf	966cff9cfa	aco/isel: Fix emit_vop2_instruction to apply 16/24-bit flags properly. Previously it used a builder function but didn't use the return value from that function, so the flags were not applied. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12786>	2021-09-20 12:39:03 +02:00
Rhys Perry	2a7fa132be	aco: implement udot_4x8/sdot_4x8/udot_2x16/sdot_2x16 opcodes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:28 +00:00
Rhys Perry	e0d232c2fc	aco: implement nir_op_pack_32_4x8 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:28 +00:00
Rhys Perry	4dd420f76d	radv,aco: implement iadd_sat Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:28 +00:00
Daniel Schürmann	0988f7b9ba	aco: remove explicit dst_preserve flag Instead, we can rely on the fact that subdword definitions must preserve the unused bits while dword definitions either pad or sign-extend. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>	2021-09-02 20:39:17 +02:00
Daniel Schürmann	9e3ff06c38	aco: rewrite SDWA selector This commit introduces a new struct SubdwordSel in order to ease and clean up the usage of SDWA selections. This includes removing the distinction between register-allocated and fixed SDWA selections. Instead, SDWA selections can now also access the high bits of subdword variables. Alignment and sizes are validated accordingly. Size, offset and sign_extend can be evaluated via helper methods. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12640>	2021-09-02 20:39:17 +02:00
Rhys Perry	9df9fe7dfa	aco: include utility in isel For std::exchange(). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Fixes: `c1d11bb92c` ("aco: Add loop creation helpers.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5301 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12614>	2021-08-30 14:28:00 +00:00
Timur Kristóf	cfb0d931f2	aco: Emit zero for the derivatives of uniforms. Observed in a shader from Resident Evil Village. This also helps prevent emitting invalid IR. Cc: mesa-stable Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12599>	2021-08-27 20:34:22 +00:00
Daniel Schürmann	23d5865f42	aco: refactor nir_op_imul selection Previously, the optimization to use v_mul_lo_u16 for 32bit multiplications was done in instruction_selection. This was moved to the optimizer to ease some case distinctions. The mixed results are due to increased use of SDWA. Totals from 2616 (1.74% of 150170) affected shaders: (GFX10.3) VGPRs: 143888 -> 143872 (-0.01%); split: -0.02%, +0.01% CodeSize: 5604032 -> 5604080 (+0.00%); split: -0.01%, +0.01% Instrs: 1086798 -> 1083915 (-0.27%); split: -0.27%, +0.01% Latency: 8215793 -> 8213023 (-0.03%); split: -0.10%, +0.07% InvThroughput: 20765157 -> 20773766 (+0.04%); split: -0.02%, +0.06% VClause: 35256 -> 35260 (+0.01%); split: -0.02%, +0.03% SClause: 29021 -> 29024 (+0.01%); split: -0.00%, +0.01% Copies: 74163 -> 74306 (+0.19%); split: -0.05%, +0.24% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11678>	2021-08-27 19:57:59 +00:00
Timur Kristóf	5b7446d74c	radv, ac, aco: Use indices 0-2 of gs_vtx_offset argument array on GFX9+. Previously, indices 0, 2, 4 were used. This worked, but it was somewhat unintuitive. This commit changes it to use indices 0, 1, 2 instead, which makes the code easier to understand. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12511>	2021-08-26 05:20:15 +00:00
Daniel Schürmann	cd489e5388	aco: remove redundant s_and exec after nir_op_inot Totals from 22585 (15.04% of 150170) affected shaders: (GFX10.3) VGPRs: 1474048 -> 1473904 (-0.01%) CodeSize: 155238876 -> 155187688 (-0.03%); split: -0.06%, +0.03% MaxWaves: 385086 -> 385122 (+0.01%) Instrs: 29297735 -> 29284442 (-0.05%); split: -0.08%, +0.04% Latency: 675841742 -> 675764151 (-0.01%); split: -0.02%, +0.01% InvThroughput: 174859037 -> 174854796 (-0.00%); split: -0.01%, +0.01% VClause: 479790 -> 479781 (-0.00%); split: -0.01%, +0.00% SClause: 1106900 -> 1106615 (-0.03%); split: -0.03%, +0.01% Copies: 1829037 -> 1828042 (-0.05%); split: -0.09%, +0.03% Branches: 859971 -> 859967 (-0.00%); split: -0.00%, +0.00% PreSGPRs: 1341850 -> 1342356 (+0.04%); split: -0.01%, +0.04% PreVGPRs: 1327322 -> 1327034 (-0.02%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11573>	2021-08-25 12:43:50 +00:00
Rhys Perry	8852c5448d	aco: fix vectorized 16-bit load_input/load_interpolated_input Seems we haven't encountered this before because nir_lower_io_to_scalar_early usually scalarizes this. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12486>	2021-08-23 10:11:36 +00:00
Timur Kristóf	448592b9ae	aco: Use Navi 10 empty NGG output workaround on NGG culling shaders. Navi 10 can hang when an NGG workgroup has no output, so we work around that by always exporting a single zero-area triangle with a single vertex that has all-NaN coordinates. Thus far, we only employed this for NGG GS, because on all other stages, the output can't be empty. However, with NGG culling, the output can be empty, so let's apply the same workaround there too. Cc: mesa-stable Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12169>	2021-08-04 12:28:34 +00:00
Rhys Perry	566970f273	aco: use image_dim and image_array intrinsic indices Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12190>	2021-08-04 12:09:07 +00:00
Timur Kristóf	da9f4b2e67	nir, aco: Remove vertex and primitive count overwrite intrinsic. It's no longer needed. No Fossil DB changes. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>	2021-08-02 11:38:25 +00:00
Timur Kristóf	1bbea90f50	aco, nir, ac: Simplify sequence of getting initial NGG VS edge flags. Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags at once of every vertex. This takes fewer VALU instructions than previously. Fossil DB results on Sienna Cichlid (with NGGC on): Totals from 56917 (44.24% of 128647) affected shaders: CodeSize: 161028288 -> 158751628 (-1.41%) Instrs: 30917985 -> 30519571 (-1.29%) Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01% InvThroughput: 21280238 -> 20927401 (-1.66%) Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00% No Fossil DB changed with NGGC off. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>	2021-08-02 11:38:25 +00:00
Samuel Pitoiset	6694c37ea0	aco: implement VK_EXT_shader_atomic_float2 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12060>	2021-07-27 08:44:31 +02:00
Jason Ekstrand	e83fe65cd8	radv,radeonsi: Do cube size divide-by-6 lowering in NIR No point in carrying all this code around twice each in two back-ends. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>	2021-07-22 14:22:35 -05:00
Timur Kristóf	55d57b828f	aco: Fix how p_elect interacts with optimizations. Since p_elect doesn't have any operands, ACO's value numbering and/or the pre-RA optimizer could currently recognize two p_elect instructions in two different blocks as the same. This patch adds exec as an operand to p_elect in order to achieve correct behavior. Fixes: `e66f54e5c8` Closes: #5080 Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11943>	2021-07-18 00:48:06 +02:00
Timur Kristóf	e66f54e5c8	aco: Allow elect to take advantage of knowing when all lanes are active. Implement elect using a pseudo-op which is lowered during the insert_exec_mask pass. This makes it possible to emit a more optimal sequence when the exec mask is constant. Fossil DB results on Sienna Cichlid: Totals from 211 (0.16% of 128647) affected shaders: CodeSize: 2254356 -> 2240468 (-0.62%); split: -0.62%, +0.00% Instrs: 438471 -> 434996 (-0.79%); split: -0.80%, +0.01% Latency: 2717082 -> 2709400 (-0.28%); split: -0.28%, +0.00% InvThroughput: 566987 -> 566342 (-0.11%); split: -0.11%, +0.00% Copies: 40058 -> 40162 (+0.26%) Branches: 31209 -> 31211 (+0.01%) PreSGPRs: 9927 -> 10125 (+1.99%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>	2021-07-16 14:31:54 +00:00
Timur Kristóf	b12318f26c	aco: Swap s_and operand order for ballot. This allows our optimizer to recognize this and eliminate it when it can prove that the s_and with exec is unneeded. Fossil DB changes on Sienna Cichlid: Totals from 1969 (1.53% of 128647) affected shaders: CodeSize: 9468228 -> 9469348 (+0.01%); split: -0.00%, +0.01% Instrs: 1773566 -> 1773581 (+0.00%); split: -0.01%, +0.01% Latency: 19504042 -> 19503385 (-0.00%); split: -0.00%, +0.00% InvThroughput: 3617406 -> 3617333 (-0.00%) Copies: 108998 -> 110592 (+1.46%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>	2021-07-16 14:31:54 +00:00
Daniel Schürmann	114d38e57d	aco/isel: avoid unnecessary calls to nir_unsigned_upper_bound() These were responsible for ~20% of the time spent in instruction selection. Reduces overall compile times by ~0.5%. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11879>	2021-07-14 18:10:40 +02:00
Timur Kristóf	182d9b1e60	aco: Implement NGG culling related intrinsics. These are very straightforward as they just copy data from the newly added shader arguments. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>	2021-07-13 23:56:33 +00:00
Tony Wasserka	cfd866ed42	aco: Clean up unneeded literal casts These were only needed to select the appropriate Operand constructor before. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>	2021-07-13 17:43:26 +00:00
Tony Wasserka	66e51dc474	aco: Remove use of deprecated Operand constructors This migration was done with libclang-based automatic tooling, which performed these replacements: * Operand(uint8_t) -> Operand::c8 * Operand(uint16_t) -> Operand::c16 * Operand(uint32_t, false) -> Operand::c32 * Operand(uint32_t, bool) -> Operand::c32_or_c64 * Operand(uint64_t) -> Operand::c64 * Operand(0) -> Operand::zero(num_bytes) Casts that were previously used for constructor selection have automatically been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)). Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>	2021-07-13 17:43:26 +00:00
Daniel Schürmann	b97cd93b35	aco: fix extract_vector optimization If the allocated_vec map contains a different RegType for the elements, ensure that the size matches exactly. Otherwise, it could happen that extracting a dword element matched with a subdword element. No fossil-db changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11823>	2021-07-13 09:14:43 +02:00
Daniel Schürmann	1e2639026f	aco: Format. Manually adjusted some comments for more intuitive line breaks. Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11258>	2021-07-12 21:27:31 +00:00
Samuel Pitoiset	ee79b87c62	radv: lower primitive shading rate in NIR This allows more potential compiler optimizations if the value is a constant or from a scalar load. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11579>	2021-07-12 17:54:07 +00:00
Daniel Schürmann	0eea0e55ad	aco: add 'common/' and 'llvm/' prefix to #includes Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>	2021-07-12 12:09:31 +00:00
Daniel Schürmann	59fdaa1985	aco: reorder and cleanup #includes Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>	2021-07-12 12:09:31 +00:00
Samuel Pitoiset	543eb42c35	aco: use nir_ssa_def_is_unused() to determine if atomic dest is used Instead of duplicating this chunk everywhere. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11793>	2021-07-09 12:12:57 +00:00
Samuel Pitoiset	74a221bcfd	aco: fix shared_atomic_comp_swap if the second source isn't a VGPR Only VGPRs are valid with DS instructions. Cc: 21.1 mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11777>	2021-07-08 10:41:14 +00:00
Rhys Perry	a9c4a31d8d	aco: handle NIR loops without breaks Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11626>	2021-07-01 10:01:52 +00:00
Rhys Perry	c094765a01	aco: remove resource flags After disabling SMEM stores, nir_opt_access() now does the same analysis and we don't need this anymore. Doing it in isel is also too late if we want to lower descriptor loads in NIR. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11652>	2021-06-30 19:07:12 +01:00
Timur Kristóf	e6bf5cfe59	aco/gfx10: Emit barrier at the start of NGG VS and TES. The Navi 1x NGG hardware can hang in certain conditions when not every wave launched before s_sendmsg(GS_ALLOC_REQ). As a workaround, to ensure this never happens, let's emit a workgroup barrier at the beginning of NGG VS and TES. Note that NGG GS already has a workgroup barrier so it doesn't need this. Cc: mesa-stable Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10837>	2021-06-22 14:32:27 +00:00

... 4 5 6 7 8 ...

886 commits