fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Rhys Perry	786828131a	aco: implement 8/16-bit instructions which can be trivially widened When nir_lower_bit_size becomes more capable, we might want to revert some of this. fossil-db (parallel-rdp, Navi): Totals from 217 (31.77% of 683) affected shaders: SGPRs: 11320 -> 10200 (-9.89%) VGPRs: 7156 -> 7364 (+2.91%) CodeSize: 1453948 -> 1430136 (-1.64%); split: -1.66%, +0.02% Instrs: 258530 -> 254840 (-1.43%); split: -1.44%, +0.01% Cycles: 37334360 -> 37247936 (-0.23%); split: -0.26%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791>	2020-11-04 11:50:37 +00:00
Rhys Perry	ef95ba8cdd	aco: implement some 16-bit arithmetic instead of lowering fossil-db (parallel-rdp, Navi): Totals from 210 (30.75% of 683) affected shaders: SGPRs: 9704 -> 10248 (+5.61%) VGPRs: 5884 -> 5368 (-8.77%) CodeSize: 1155564 -> 1098752 (-4.92%) Instrs: 199927 -> 189940 (-5.00%) Cycles: 20438392 -> 19860124 (-2.83%) v2: use divergence analysis to determine which instructions to lower. Co-Authored-by: Daniel Schürmann <daniel@schuermann.dev> Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791>	2020-11-04 11:50:37 +00:00
Samuel Pitoiset	57c152af9c	aco: select v_mul_{hi}_u32_u24 for 24-bit multiplications This is based on the NIR range analysis. v_mul_u32_u24 is VOP2, while v_mul_lo_u32 is VOP3, so that should reduce codesize. fossils-db (Vega10): Totals from 12590 (9.22% of 136546) affected shaders: SGPRs: 680207 -> 677271 (-0.43%); split: -0.47%, +0.04% VGPRs: 620840 -> 620856 (+0.00%); split: -0.02%, +0.02% CodeSize: 37930200 -> 37774088 (-0.41%); split: -0.41%, +0.00% Instrs: 7463550 -> 7458120 (-0.07%); split: -0.07%, +0.00% Cycles: 133487628 -> 133427532 (-0.05%); split: -0.05%, +0.00% VMEM: 2514729 -> 2513426 (-0.05%); split: +0.02%, -0.08% SMEM: 1533579 -> 1532795 (-0.05%); split: +0.05%, -0.10% VClause: 231391 -> 231389 (-0.00%); split: -0.01%, +0.00% SClause: 255352 -> 255294 (-0.02%); split: -0.04%, +0.02% Copies: 605821 -> 600352 (-0.90%); split: -0.92%, +0.02% Branches: 133739 -> 133743 (+0.00%); split: -0.00%, +0.00% PreSGPRs: 351092 -> 348048 (-0.87%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405>	2020-11-03 13:47:40 +00:00
Samuel Pitoiset	3a72021d7c	aco: store NIR range analysis data to the isel context It will be used to optimize some ALU instructions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405>	2020-11-03 13:47:40 +00:00
James Park	4bd18e772a	amd/llvm,aco: Replace VLA with alloca MSVC will never support VLA, so use alloca instead. Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7157>	2020-11-03 07:44:02 +00:00
Samuel Pitoiset	03f260cb27	radv,aco: optimize computing the sample mask for per-sample shading I don't know why these values were introduced for but it seems like we can optimize this by just doing: gl_SampleMaskIn[0] = (SampleCoverage & (1 << gl_SampleID)) AMDGPU-PRO and AMDVLK apply the same formula to compute the sample mask when per-sample shading is enabled. No fossils-db changes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377>	2020-11-02 08:05:47 +01:00
Samuel Pitoiset	c63bcda22c	radv,aco: adjust the sample mask only if per-sample shading is enabled When per-sample shading isn't enabled, we can just load the samplemask from the hardware which is always the coverage of the entire pixel/fragment. fossilds-db (VEGA10): Totals from 131 (0.10% of 136546) affected shaders: SGPRs: 5056 -> 5048 (-0.16%) VGPRs: 2600 -> 2372 (-8.77%) CodeSize: 115788 -> 112560 (-2.79%) MaxWaves: 1266 -> 1274 (+0.63%) Instrs: 20620 -> 20071 (-2.66%) Cycles: 82416 -> 80220 (-2.66%) VMEM: 51567 -> 35532 (-31.10%); split: +0.24%, -31.34% SMEM: 8952 -> 8258 (-7.75%); split: +0.11%, -7.86% SClause: 1223 -> 1199 (-1.96%); split: -2.62%, +0.65% Copies: 1247 -> 1124 (-9.86%); split: -10.18%, +0.32% PreVGPRs: 2112 -> 1981 (-6.20%) Helps Britannia, Shadow of the Tomb Raider, Warhammer II and Control. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377>	2020-11-02 08:05:43 +01:00
Daniel Schürmann	f4c090a3b3	aco: refactor split_store_data() to always split into evenly sized elements This fixes a couple of issues on GFX67 and has no negative impact on newer hardware Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7105>	2020-10-29 14:32:59 +00:00
Timur Kristóf	09b9e52c0d	aco/ngg: Export a zero-area triangle when primitive count is 0. This is a workaround for a bug in Navi 1x NGG HW. Very rarely, the Navi 1x PA can hang when an NGG workgroup exports 0 total primitives. According to AMD, we always need this workaround when it is possible that the number of primitives is 0. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>	2020-10-28 21:55:47 +01:00
Timur Kristóf	b6654adc0e	aco: Make emitting reduction instructions a bit more convenient. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>	2020-10-28 21:47:22 +01:00
Timur Kristóf	260f9c503a	aco/ngg: Put shader query reduction operand into a VGPR. The p_reduce instruction only works if this operand is in a VGPR, and otherwise gets lowered to incorrect code. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>	2020-10-28 21:47:22 +01:00
Timur Kristóf	9757c3cb6b	aco: Assert that workgroup barriers are not used inappropriately. Example: It is possible for some NGG GS waves to have 0 ES and/or GS invocations, and in that case having an s_barrier inside divergent control flow can very possibly hang the GPU. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>	2020-10-28 21:47:19 +01:00
Rhys Perry	483657de32	aco: use mubuf helper in select_gs_copy_shader Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103>	2020-10-28 14:59:49 +00:00
Rhys Perry	ec7ecfe9cb	aco: use control flow creation helpers in select_gs_copy_shader Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103>	2020-10-28 14:59:49 +00:00
Daniel Schürmann	543f50789a	aco: implement nir_op_unpack_[64/32]_* Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527>	2020-10-28 10:14:26 +00:00
Rhys Perry	26e53e3afa	aco: ignore the ACO-inserted continue in create_continue_phis() Otherwise, for loops without continue_or_break, create_continue_phis() always returns an undef operand. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `638cbc21a1` ("aco: handle when ACO adds new continue edges") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2848 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7148>	2020-10-27 19:53:38 +00:00
Rhys Perry	437995bb70	aco: remove all-undef phi opt This doesn't look like it would create correct IR for 8/16-bit phis and doesn't seem to help anything. If we ever want to do this, it's probably better done in nir_opt_remove_phis(). No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216>	2020-10-27 15:24:38 +00:00
Rhys Perry	d20a752c0d	aco: use Builder::copy more fossil-db (Navi): Totals from 6973 (5.07% of 137413) affected shaders: SGPRs: 381768 -> 381776 (+0.00%) VGPRs: 306092 -> 306096 (+0.00%); split: -0.00%, +0.00% CodeSize: 24440844 -> 24421196 (-0.08%); split: -0.09%, +0.01% MaxWaves: 86581 -> 86583 (+0.00%) Instrs: 4682161 -> 4679578 (-0.06%); split: -0.06%, +0.00% Cycles: 68793116 -> 68261648 (-0.77%); split: -0.83%, +0.05% fossil-db (Polaris): Totals from 8154 (5.87% of 138881) affected shaders: VGPRs: 338916 -> 338920 (+0.00%); split: -0.00%, +0.00% CodeSize: 23540428 -> 23540488 (+0.00%); split: -0.00%, +0.00% MaxWaves: 49090 -> 49091 (+0.00%) Instrs: 4576085 -> 4576101 (+0.00%); split: -0.00%, +0.00% Cycles: 51720704 -> 51720888 (+0.00%); split: -0.00%, +0.00% Most of the Navi cycle/instruction changes are from 8/16-bit parallel-rdp shaders. They appear to be improved because the p_create_vector from lower_subdword_phis() was blocking constant propagation. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216>	2020-10-27 15:24:38 +00:00
Rhys Perry	72b307a338	aco: don't do divergent break+discard If the shader does: loop { if (divergent) discard else a() b() } then a()'s block will dominate b()'s block in the logical CFG, but not the linear CFG. This will cause value numbering to try to combine SLAU from a() and b(). This didn't happen with break/continue because sanitize_if() would move a() out of the branch. Using sanitize_if() to fix this doesn't look easy, because discards are not control flow instructions in NIR. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216>	2020-10-27 15:24:38 +00:00
Rhys Perry	27ce5d921e	aco: remove isel_context::allocated Now that we have Program::temp_rc, we can replace it with the first temporary id allocated for NIR's ssa defs. No fossil-db changes on Navi. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7067>	2020-10-26 15:14:32 +00:00
Samuel Pitoiset	4e2fe34aa9	aco: fix determining if LOD is zero for nir_texop_txf/nir_texop_txs txf/txs expects LOD to be a 32-bit unsigned integer while other texture operations expects a float. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3668 Fixes: `93c8ebfa78` ("aco: Initial commit of independent AMD compiler") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7256>	2020-10-22 11:30:43 +00:00
Samuel Pitoiset	eb6877d3af	radv,aco: fix use of texop_samples_identical in the resolve meta path The return value of this texture intrinsic should be a NIR 1-bit bool. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7236>	2020-10-21 13:06:53 +02:00
Tony Wasserka	fd038132de	aco/isel: Miscellaneous cleanups using the new Stage API Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>	2020-10-21 09:49:38 +00:00
Tony Wasserka	34bc9477de	aco: Clean up symbol names and comments related to NGG Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>	2020-10-21 09:49:38 +00:00
Tony Wasserka	86c227c10c	aco: Use strong typing to model SW<->HW stage mappings Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>	2020-10-21 09:49:38 +00:00
Bas Nieuwenhuizen	76421667ec	aco: Add VK_KHR_shader_terminate_invocation support. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7226>	2020-10-20 22:53:08 +00:00
Timur Kristóf	d8435c1628	aco/ngg: Add assertion to make sure we always know the vertex count. Just a sanity check to avoid hangs caused by missing this in the future. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7213>	2020-10-20 07:11:29 +00:00
James Park	af8d488ea5	util,ac,aco,radv: Cross-platform memstream API POSIX memstream is not available on Windows. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7143>	2020-10-19 03:37:42 -07:00
Rhys Perry	fdb65b8b23	aco: add missing SCC clobber in get_buffer_size Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `fcd6d83245` ("aco: fix imageSize()/textureSize() with large buffers on GFX8") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7162>	2020-10-15 21:11:45 +00:00
Tony Wasserka	d5a72319d6	aco/isel: Remove now unused VS-related code from create_null_export Also replaced a hardcoded constant with the appropriate register macro. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>	2020-10-14 16:22:51 +00:00
Tony Wasserka	c22c702f35	aco/isel: Remove some dead code exported_pos was always initialized to true (due to the is_pos argument of the first export_vs_varying call being true), so none of this code has any effect. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>	2020-10-14 16:22:51 +00:00
Tony Wasserka	bf51b11c04	aco/isel: Always export position data from VS/NGG AMD ISA docs explicitly require this for VS, and this likely extends to NGG too. Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3615 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>	2020-10-14 16:22:51 +00:00
Daniel Schürmann	f29c81f863	aco: use VOP2 for v_cvt_pkrtz_f16_f32 if possible This patch also does a slight rework of export_fs_mrt_color() to avoid setting of enabled channels which are not used. Totals from 52404 (38.38% of 136546) affected shaders (NAVI): SGPRs: 3097443 -> 3097435 (-0.00%) CodeSize: 189151600 -> 188546200 (-0.32%) Instrs: 36445061 -> 36445104 (+0.00%); split: -0.00%, +0.00% Cycles: 1739388020 -> 1739388192 (+0.00%); split: -0.00%, +0.00% VMEM: 21071501 -> 21071665 (+0.00%); split: +0.00%, -0.00% SMEM: 3470983 -> 3470982 (-0.00%); split: +0.00%, -0.00% PreSGPRs: 2058965 -> 2058962 (-0.00%) PreVGPRs: 1860294 -> 1860295 (+0.00%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>	2020-10-14 15:31:38 +00:00
Daniel Schürmann	7240edec2a	aco: use VOP2 version of v_cvt_pkrtz_f16_f32 on GFX_6_7_10 Totals from 767 (0.56% of 136546) affected shaders (NAVI): CodeSize: 2862208 -> 2850036 (-0.43%) Instrs: 561572 -> 561574 (+0.00%) Cycles: 6455420 -> 6455428 (+0.00%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>	2020-10-14 15:31:38 +00:00
Daniel Schürmann	2f125908b3	radv,aco: lower_pack_half_2x16 This patch also optimizes pack_half_2x16(a, 0.0). Totals from 1949 (1.43% of 136546) affected shaders (RAVEN): SGPRs: 83376 -> 83336 (-0.05%) CodeSize: 3532144 -> 3512352 (-0.56%) Instrs: 660746 -> 660682 (-0.01%); split: -0.01%, +0.00% Cycles: 6780716 -> 6780472 (-0.00%); split: -0.00%, +0.00% VMEM: 990886 -> 990883 (-0.00%); split: +0.00%, -0.00% SMEM: 150506 -> 150538 (+0.02%); split: +0.05%, -0.03% SClause: 30595 -> 30594 (-0.00%); split: -0.01%, +0.00% Copies: 40801 -> 40729 (-0.18%) PreSGPRs: 52335 -> 52341 (+0.01%); split: -0.03%, +0.04% PreVGPRs: 45104 -> 45097 (-0.02%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>	2020-10-14 15:31:38 +00:00
Daniel Schürmann	dae1e6f756	aco: use v_cvt_pkrtz_f16_f32 for pack_half_2x16 Apparently, we forgot to remove some debug code. This patch also fixes the round mode check to consider the destination bit width. Totals from 2218 (1.62% of 136546) affected shaders (RAVEN): SGPRs: 100848 -> 100280 (-0.56%) VGPRs: 68536 -> 66044 (-3.64%); split: -3.68%, +0.05% CodeSize: 4882296 -> `4837220` (-0.92%); split: -0.94%, +0.01% MaxWaves: 18990 -> 19019 (+0.15%); split: +0.19%, -0.04% Instrs: 938150 -> 930388 (-0.83%); split: -0.83%, +0.00% Cycles: 8699824 -> 8667648 (-0.37%); split: -0.38%, +0.01% VMEM: 1144502 -> 1059680 (-7.41%); split: +0.06%, -7.48% SMEM: 170076 -> 167999 (-1.22%); split: +0.22%, -1.44% VClause: 18428 -> 18422 (-0.03%) SClause: 41375 -> 41353 (-0.05%); split: -0.06%, +0.00% Copies: 60008 -> 60054 (+0.08%); split: -0.31%, +0.39% PreVGPRs: 56163 -> 56142 (-0.04%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>	2020-10-14 15:31:38 +00:00
Daniel Schürmann	aec872cda0	aco: use p_split_vector for nir_op_unpack_half_* This enables the use of SDWA if possible Totals from 9933 (7.27% of 136546) affected shaders (RAVEN): VGPRs: 731764 -> 731772 (+0.00%); split: -0.00%, +0.00% CodeSize: 90944852 -> 90671472 (-0.30%); split: -0.30%, +0.00% Instrs: 17881885 -> 17867831 (-0.08%); split: -0.08%, +0.00% Cycles: 1597904072 -> 1597771260 (-0.01%); split: -0.01%, +0.00% VMEM: 1702328 -> 1697383 (-0.29%); split: +0.13%, -0.42% SMEM: 659583 -> 659049 (-0.08%); split: +0.01%, -0.09% VClause: 318024 -> 318025 (+0.00%); split: -0.00%, +0.00% SClause: 631670 -> 631707 (+0.01%); split: -0.01%, +0.01% Copies: 1504107 -> 1504626 (+0.03%); split: -0.01%, +0.04% PreVGPRs: 683153 -> 683180 (+0.00%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>	2020-10-14 15:31:38 +00:00
Daniel Schürmann	a38a497b86	aco: use p_create_vector for nir_op_pack_half_2x16 This enables the use of SDWA if possible Totals from 2218 (1.62% of 136546) affected shaders (RAVEN): VGPRs: 68508 -> 68516 (+0.01%) CodeSize: 4897024 -> 4881068 (-0.33%); split: -0.33%, +0.00% MaxWaves: 18992 -> 18990 (-0.01%) Instrs: 946942 -> 939161 (-0.82%); split: -0.82%, +0.00% Cycles: 8737668 -> 8705704 (-0.37%); split: -0.37%, +0.00% VMEM: 1155362 -> 1145245 (-0.88%); split: +0.00%, -0.88% SMEM: 170435 -> 170165 (-0.16%); split: +0.01%, -0.16% VClause: 18426 -> 18425 (-0.01%) SClause: 41376 -> 41375 (-0.00%) Copies: 59813 -> 59787 (-0.04%); split: -0.15%, +0.10% PreVGPRs: 56126 -> 56136 (+0.02%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>	2020-10-14 15:31:38 +00:00
Rhys Perry	c122315702	aco: fix get_ssbo_size with a vgpr resource The result of load_vulkan_descriptor is passed directly to get_ssbo_size. This caused convert_pointer_to_64_bit() to skip creating a v_readfirstlane_b32 if it was necessary. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `05b6612b4e` ('radv: do not lower UBO/SSBO access to offsets') Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3628 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7095>	2020-10-13 14:20:28 +00:00
Rhys Perry	bb5c0ba0d2	aco: implement last_invocation Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>	2020-10-13 12:47:21 +00:00
Rhys Perry	36da9c4aa2	aco: implement elect Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>	2020-10-13 12:47:20 +00:00
Rhys Perry	bf77f539ee	aco: optimize more uniform reductions/scans Uniform atomic optimization will create these. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>	2020-10-13 12:47:20 +00:00
Samuel Pitoiset	b9ca4923d6	aco: implement missing nir_op_unpack_half_2x16_split_{x,y}_flush_to_zero SPIRV->NIR emits nir_op_unpack_half_2x16_flush_to_zero instead of nir_op_unpack_half_2x16 if the shader enables denorm flush to zero for 16-bit floating point. This doesn't fix anything known and CTS doesn't have tests. Fixes: `56d9bcdded` ("radv: enable more float_controls features") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6939>	2020-10-13 08:35:22 +02:00
Samuel Pitoiset	b0829c6af7	radv: replace RADV_ALPHA_ADJUST by AC_FETCH_FORMAT Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7065>	2020-10-12 13:13:40 +00:00
Timur Kristóf	61280bb4b6	aco/ngg: Allocate NGG GS space early for const vertex/primitive counts. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:15 +02:00
Timur Kristóf	e8a0409d01	aco/ngg: Use more efficient LDS layout to help reduce bank conflicts. The LLVM backend has a trick which helps reduce LDS bank conflicts by swizzling the LDS address where each vertex is emitted. This commit implements the same thing for ACO. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:15 +02:00
Timur Kristóf	dd73719856	aco/ngg: Add shader query support to NGG GS. In each GS thread, we calculate the number of "real" primitives that were emitted (points, lines, triangles, not strips). Then we accumulate the number of "real" primitives emitted by the entire threadgroup in GDS. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:15 +02:00
Timur Kristóf	df62c8fbea	aco/ngg: Place workgroup barrier outside control flow for NGG GS. Merged shaders have a workgroup barrier which makes sure that the first half is completed in every wave before the 2nd half is started. This barrier is located in divergent control flow, so that waves that don't have any invocations in the 2nd half can finish as early as possible. This is problematic for NGG GS because it has more workgroup barriers after the 2nd half. So, for NGG GS we need to put the barrier outside control flow because otherwise the waves that have 0 GS threads won't be able to wait for the waves which have non-zero GS threads. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:15 +02:00
Timur Kristóf	1129575d5e	aco/ngg: Implement NGG GS output. We store emitted GS vertices in LDS. Then, at the end of the shader, the emitted vertices are compacted and each thread loads a single vertex from LDS in order to export a primitive as needed, and the vertex attributes. The reason this is done is because there is an impedance mismatch between how API GS and the NGG HW works. API GS can emit an arbitrary number of vertices and primites in each thread, but NGG HW can only export one vertex per thread. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:15 +02:00
Timur Kristóf	62b5012ec3	aco/ngg: Implement workgroup reduce / exclusive scan for NGG GS. This function calculates two things at once: 1. The total number of vertices emitted by the threadgroup. 2. Exclusive scan of emitted vertex count accross the threadgroup. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:15 +02:00

1 2 3 4 5 ...

436 commits