Commit graph

446 commits

Author SHA1 Message Date
Samuel Pitoiset
8e961b91c3 aco: optimize v_add+v_lshlrev to v_mad_u32_u24 on GFX6-8
This optimizes v_add(c, v_lshlrev(a, b)) to v_mad_u32_u24(b, 1<<a, c)
if 'a' is a constant (less than or equal to 6 to avoid creating
literals) and 'b' known to be a 16-bit or a 24-bit value.

On GFX9+, this is already optimized to v_lshl_add_u32.

No fossils-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7673>
2020-11-23 18:34:40 +00:00
Samuel Pitoiset
d9e4504b0d aco: optimize v_add+s_lshl to v_mad_u32_u24 on GFX6-8
This optimizes v_add(c, s_lshl(a, b)) to v_mad_u32_u24(a, 1<<b, c)
if 'b' is a constant (less than or equal to 6 to avoid creating
literals) and 'a' known to be a 16-bit or a 24-bit value.

On GFX9+, this is already optimized to v_lshl_add_u32.

fossils-db (Polaris10):
Totals from 1916 (1.36% of 140385) affected shaders:
SGPRs: 88322 -> 87780 (-0.61%); split: -0.66%, +0.05%
CodeSize: 7852668 -> 7851800 (-0.01%); split: -0.01%, +0.00%
Instrs: 1533965 -> 1530459 (-0.23%); split: -0.23%, +0.00%
Cycles: 57001852 -> 56983244 (-0.03%); split: -0.03%, +0.00%
VMEM: 372561 -> 371733 (-0.22%); split: +0.03%, -0.25%
SMEM: 108859 -> 103711 (-4.73%); split: +0.23%, -4.96%
VClause: 37231 -> 37204 (-0.07%)
SClause: 58116 -> 58086 (-0.05%); split: -0.06%, +0.01%
Copies: 199953 -> 199931 (-0.01%); split: -0.03%, +0.02%
Branches: 63478 -> 63477 (-0.00%)
PreSGPRs: 61818 -> 61816 (-0.00%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7673>
2020-11-23 18:34:40 +00:00
Samuel Pitoiset
eaef1f2127 aco: allow to use the range analysis UB in emit_{sop2,vop2}_instruction()
It will allow to combine v_add+s_lshl or v_add+v_lshlrev to
v_mad_u32_u24 on GFX6-8 if operands are known to be 16-bit or 24-bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7673>
2020-11-23 18:34:40 +00:00
Rhys Perry
aab507c6b0 aco: use v_mul_imm() for some nir_op_imul
Some of the optimizations v_mul_imm() does are complex and very
target-specific and not suitable to do in ACO's optimizer.

fossil-db (Vega):
Totals from 49135 (35.76% of 137413) affected shaders:
SGPRs: 2698547 -> 2696103 (-0.09%); split: -0.16%, +0.07%
VGPRs: 2301412 -> 2301600 (+0.01%); split: -0.01%, +0.02%
SpillSGPRs: 51520 -> 51519 (-0.00%)
CodeSize: 168798572 -> 169164012 (+0.22%); split: -0.00%, +0.22%
MaxWaves: 306553 -> 306539 (-0.00%); split: +0.00%, -0.01%
Instrs: 33423982 -> 33506598 (+0.25%); split: -0.00%, +0.25%
Cycles: 1807800632 -> 1804101376 (-0.20%); split: -0.20%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5390>
2020-11-20 19:50:32 +00:00
Tony Wasserka
2bb8874320 aco: Fix -Wshadow warnings
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7430>
2020-11-20 09:29:19 +00:00
Rhys Perry
867323379e aco: don't use SMEM for SSBO stores
fossil-db (Navi):
Totals from 70 (0.05% of 138791) affected shaders:
SGPRs: 2324 -> 2097 (-9.77%)
VGPRs: 1344 -> 1480 (+10.12%)
CodeSize: 157872 -> 154836 (-1.92%); split: -1.93%, +0.01%
MaxWaves: 1288 -> 1260 (-2.17%)
Instrs: 29730 -> 29108 (-2.09%); split: -2.13%, +0.04%
Cycles: 394944 -> 391280 (-0.93%); split: -0.94%, +0.01%
VMEM: 5288 -> 5695 (+7.70%); split: +11.97%, -4.27%
SMEM: 2680 -> 2444 (-8.81%); split: +1.34%, -10.15%
VClause: 291 -> 502 (+72.51%)
SClause: 1176 -> 918 (-21.94%)
Copies: 3549 -> 3517 (-0.90%); split: -1.80%, +0.90%
Branches: 1230 -> 1228 (-0.16%)
PreSGPRs: 1675 -> 1491 (-10.99%)
PreVGPRs: 1101 -> 1223 (+11.08%)

Totals from 70 (0.05% of 139517) affected shaders (RAVEN):
SGPRs: 2368 -> 2121 (-10.43%)
VGPRs: 1344 -> 1480 (+10.12%)
CodeSize: 156664 -> 153252 (-2.18%)
MaxWaves: 636 -> 622 (-2.20%)
Instrs: 29968 -> 29226 (-2.48%)
Cycles: 398284 -> 393492 (-1.20%)
VMEM: 5544 -> 5930 (+6.96%); split: +11.72%, -4.76%
SMEM: 2752 -> 2502 (-9.08%); split: +1.20%, -10.28%
VClause: 292 -> 504 (+72.60%)
SClause: 1236 -> 940 (-23.95%)
Copies: 3907 -> 3852 (-1.41%); split: -2.20%, +0.79%
Branches: 1230 -> 1228 (-0.16%)
PreSGPRs: 1671 -> 1487 (-11.01%)
PreVGPRs: 1102 -> 1225 (+11.16%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6143>
2020-11-16 15:52:22 +00:00
Samuel Pitoiset
20e48551ac aco: select v_mul_lo_u16 for 16-bit multiplications that can't overflow
Only on GFX8-9 because GFX10 doesn't zero the upper 16 bits.

No fossils-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7425>
2020-11-12 12:32:26 +00:00
Samuel Pitoiset
7028e9875f aco: select v_mad_u32_u16 for 16-bit multiplications on GFX9+
No fossils-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7425>
2020-11-12 12:32:26 +00:00
Rhys Perry
5b81e80fb6 aco: implement 64-bit images
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7234>
2020-11-09 18:28:59 +00:00
Jason Ekstrand
21b1b91549 nir,spirv: Add support for the ShaderCallKHR scope
It's currently entirely trivial.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
2020-11-05 23:36:46 +00:00
Rhys Perry
786828131a aco: implement 8/16-bit instructions which can be trivially widened
When nir_lower_bit_size becomes more capable, we might want to revert some
of this.

fossil-db (parallel-rdp, Navi):
Totals from 217 (31.77% of 683) affected shaders:
SGPRs: 11320 -> 10200 (-9.89%)
VGPRs: 7156 -> 7364 (+2.91%)
CodeSize: 1453948 -> 1430136 (-1.64%); split: -1.66%, +0.02%
Instrs: 258530 -> 254840 (-1.43%); split: -1.44%, +0.01%
Cycles: 37334360 -> 37247936 (-0.23%); split: -0.26%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791>
2020-11-04 11:50:37 +00:00
Rhys Perry
ef95ba8cdd aco: implement some 16-bit arithmetic instead of lowering
fossil-db (parallel-rdp, Navi):
Totals from 210 (30.75% of 683) affected shaders:
SGPRs: 9704 -> 10248 (+5.61%)
VGPRs: 5884 -> 5368 (-8.77%)
CodeSize: 1155564 -> 1098752 (-4.92%)
Instrs: 199927 -> 189940 (-5.00%)
Cycles: 20438392 -> 19860124 (-2.83%)

v2: use divergence analysis to determine which instructions to lower.

Co-Authored-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791>
2020-11-04 11:50:37 +00:00
Samuel Pitoiset
57c152af9c aco: select v_mul_{hi}_u32_u24 for 24-bit multiplications
This is based on the NIR range analysis. v_mul_u32_u24 is VOP2, while
v_mul_lo_u32 is VOP3, so that should reduce codesize.

fossils-db (Vega10):
Totals from 12590 (9.22% of 136546) affected shaders:
SGPRs: 680207 -> 677271 (-0.43%); split: -0.47%, +0.04%
VGPRs: 620840 -> 620856 (+0.00%); split: -0.02%, +0.02%
CodeSize: 37930200 -> 37774088 (-0.41%); split: -0.41%, +0.00%
Instrs: 7463550 -> 7458120 (-0.07%); split: -0.07%, +0.00%
Cycles: 133487628 -> 133427532 (-0.05%); split: -0.05%, +0.00%
VMEM: 2514729 -> 2513426 (-0.05%); split: +0.02%, -0.08%
SMEM: 1533579 -> 1532795 (-0.05%); split: +0.05%, -0.10%
VClause: 231391 -> 231389 (-0.00%); split: -0.01%, +0.00%
SClause: 255352 -> 255294 (-0.02%); split: -0.04%, +0.02%
Copies: 605821 -> 600352 (-0.90%); split: -0.92%, +0.02%
Branches: 133739 -> 133743 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 351092 -> 348048 (-0.87%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405>
2020-11-03 13:47:40 +00:00
Samuel Pitoiset
3a72021d7c aco: store NIR range analysis data to the isel context
It will be used to optimize some ALU instructions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405>
2020-11-03 13:47:40 +00:00
James Park
4bd18e772a amd/llvm,aco: Replace VLA with alloca
MSVC will never support VLA, so use alloca instead.

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7157>
2020-11-03 07:44:02 +00:00
Samuel Pitoiset
03f260cb27 radv,aco: optimize computing the sample mask for per-sample shading
I don't know why these values were introduced for but it seems like
we can optimize this by just doing:

gl_SampleMaskIn[0] = (SampleCoverage & (1 << gl_SampleID))

AMDGPU-PRO and AMDVLK apply the same formula to compute the
sample mask when per-sample shading is enabled.

No fossils-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377>
2020-11-02 08:05:47 +01:00
Samuel Pitoiset
c63bcda22c radv,aco: adjust the sample mask only if per-sample shading is enabled
When per-sample shading isn't enabled, we can just load the
samplemask from the hardware which is always the coverage of
the entire pixel/fragment.

fossilds-db (VEGA10):
Totals from 131 (0.10% of 136546) affected shaders:
SGPRs: 5056 -> 5048 (-0.16%)
VGPRs: 2600 -> 2372 (-8.77%)
CodeSize: 115788 -> 112560 (-2.79%)
MaxWaves: 1266 -> 1274 (+0.63%)
Instrs: 20620 -> 20071 (-2.66%)
Cycles: 82416 -> 80220 (-2.66%)
VMEM: 51567 -> 35532 (-31.10%); split: +0.24%, -31.34%
SMEM: 8952 -> 8258 (-7.75%); split: +0.11%, -7.86%
SClause: 1223 -> 1199 (-1.96%); split: -2.62%, +0.65%
Copies: 1247 -> 1124 (-9.86%); split: -10.18%, +0.32%
PreVGPRs: 2112 -> 1981 (-6.20%)

Helps Britannia, Shadow of the Tomb Raider, Warhammer II and Control.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377>
2020-11-02 08:05:43 +01:00
Daniel Schürmann
f4c090a3b3 aco: refactor split_store_data() to always split into evenly sized elements
This fixes a couple of issues on GFX67 and
has no negative impact on newer hardware

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7105>
2020-10-29 14:32:59 +00:00
Timur Kristóf
09b9e52c0d aco/ngg: Export a zero-area triangle when primitive count is 0.
This is a workaround for a bug in Navi 1x NGG HW.

Very rarely, the Navi 1x PA can hang when an NGG workgroup exports
0 total primitives. According to AMD, we always need this workaround
when it is possible that the number of primitives is 0.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>
2020-10-28 21:55:47 +01:00
Timur Kristóf
b6654adc0e aco: Make emitting reduction instructions a bit more convenient.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>
2020-10-28 21:47:22 +01:00
Timur Kristóf
260f9c503a aco/ngg: Put shader query reduction operand into a VGPR.
The p_reduce instruction only works if this operand is in a VGPR,
and otherwise gets lowered to incorrect code.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>
2020-10-28 21:47:22 +01:00
Timur Kristóf
9757c3cb6b aco: Assert that workgroup barriers are not used inappropriately.
Example:
It is possible for some NGG GS waves to have 0 ES and/or GS invocations,
and in that case having an s_barrier inside divergent control flow can
very possibly hang the GPU.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>
2020-10-28 21:47:19 +01:00
Rhys Perry
483657de32 aco: use mubuf helper in select_gs_copy_shader
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103>
2020-10-28 14:59:49 +00:00
Rhys Perry
ec7ecfe9cb aco: use control flow creation helpers in select_gs_copy_shader
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103>
2020-10-28 14:59:49 +00:00
Daniel Schürmann
543f50789a aco: implement nir_op_unpack_[64/32]_*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527>
2020-10-28 10:14:26 +00:00
Rhys Perry
26e53e3afa aco: ignore the ACO-inserted continue in create_continue_phis()
Otherwise, for loops without continue_or_break, create_continue_phis()
always returns an undef operand.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 638cbc21a1 ("aco: handle when ACO adds new continue edges")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2848
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7148>
2020-10-27 19:53:38 +00:00
Rhys Perry
437995bb70 aco: remove all-undef phi opt
This doesn't look like it would create correct IR for 8/16-bit phis and
doesn't seem to help anything. If we ever want to do this, it's probably
better done in nir_opt_remove_phis().

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216>
2020-10-27 15:24:38 +00:00
Rhys Perry
d20a752c0d aco: use Builder::copy more
fossil-db (Navi):
Totals from 6973 (5.07% of 137413) affected shaders:
SGPRs: 381768 -> 381776 (+0.00%)
VGPRs: 306092 -> 306096 (+0.00%); split: -0.00%, +0.00%
CodeSize: 24440844 -> 24421196 (-0.08%); split: -0.09%, +0.01%
MaxWaves: 86581 -> 86583 (+0.00%)
Instrs: 4682161 -> 4679578 (-0.06%); split: -0.06%, +0.00%
Cycles: 68793116 -> 68261648 (-0.77%); split: -0.83%, +0.05%

fossil-db (Polaris):
Totals from 8154 (5.87% of 138881) affected shaders:
VGPRs: 338916 -> 338920 (+0.00%); split: -0.00%, +0.00%
CodeSize: 23540428 -> 23540488 (+0.00%); split: -0.00%, +0.00%
MaxWaves: 49090 -> 49091 (+0.00%)
Instrs: 4576085 -> 4576101 (+0.00%); split: -0.00%, +0.00%
Cycles: 51720704 -> 51720888 (+0.00%); split: -0.00%, +0.00%

Most of the Navi cycle/instruction changes are from 8/16-bit parallel-rdp
shaders. They appear to be improved because the p_create_vector from
lower_subdword_phis() was blocking constant propagation.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216>
2020-10-27 15:24:38 +00:00
Rhys Perry
72b307a338 aco: don't do divergent break+discard
If the shader does:
loop {
   if (divergent)
      discard
   else
      a()
   b()
}
then a()'s block will dominate b()'s block in the logical CFG, but not the
linear CFG. This will cause value numbering to try to combine SLAU from
a() and b().

This didn't happen with break/continue because sanitize_if() would move
a() out of the branch. Using sanitize_if() to fix this doesn't look easy,
because discards are not control flow instructions in NIR.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216>
2020-10-27 15:24:38 +00:00
Rhys Perry
27ce5d921e aco: remove isel_context::allocated
Now that we have Program::temp_rc, we can replace it with the first
temporary id allocated for NIR's ssa defs.

No fossil-db changes on Navi.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7067>
2020-10-26 15:14:32 +00:00
Samuel Pitoiset
4e2fe34aa9 aco: fix determining if LOD is zero for nir_texop_txf/nir_texop_txs
txf/txs expects LOD to be a 32-bit unsigned integer while other
texture operations expects a float.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3668
Fixes: 93c8ebfa78 ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7256>
2020-10-22 11:30:43 +00:00
Samuel Pitoiset
eb6877d3af radv,aco: fix use of texop_samples_identical in the resolve meta path
The return value of this texture intrinsic should be a NIR 1-bit bool.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7236>
2020-10-21 13:06:53 +02:00
Tony Wasserka
fd038132de aco/isel: Miscellaneous cleanups using the new Stage API
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>
2020-10-21 09:49:38 +00:00
Tony Wasserka
34bc9477de aco: Clean up symbol names and comments related to NGG
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>
2020-10-21 09:49:38 +00:00
Tony Wasserka
86c227c10c aco: Use strong typing to model SW<->HW stage mappings
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094>
2020-10-21 09:49:38 +00:00
Bas Nieuwenhuizen
76421667ec aco: Add VK_KHR_shader_terminate_invocation support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7226>
2020-10-20 22:53:08 +00:00
Timur Kristóf
d8435c1628 aco/ngg: Add assertion to make sure we always know the vertex count.
Just a sanity check to avoid hangs caused by missing this
in the future.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7213>
2020-10-20 07:11:29 +00:00
James Park
af8d488ea5 util,ac,aco,radv: Cross-platform memstream API
POSIX memstream is not available on Windows.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7143>
2020-10-19 03:37:42 -07:00
Rhys Perry
fdb65b8b23 aco: add missing SCC clobber in get_buffer_size
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: fcd6d83245 ("aco: fix imageSize()/textureSize() with large buffers on GFX8")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7162>
2020-10-15 21:11:45 +00:00
Tony Wasserka
d5a72319d6 aco/isel: Remove now unused VS-related code from create_null_export
Also replaced a hardcoded constant with the appropriate register macro.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>
2020-10-14 16:22:51 +00:00
Tony Wasserka
c22c702f35 aco/isel: Remove some dead code
exported_pos was always initialized to true (due to the is_pos argument
of the first export_vs_varying call being true), so none of this code has
any effect.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>
2020-10-14 16:22:51 +00:00
Tony Wasserka
bf51b11c04 aco/isel: Always export position data from VS/NGG
AMD ISA docs explicitly require this for VS, and this likely extends to
NGG too.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3615
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102>
2020-10-14 16:22:51 +00:00
Daniel Schürmann
f29c81f863 aco: use VOP2 for v_cvt_pkrtz_f16_f32 if possible
This patch also does a slight rework of export_fs_mrt_color()
to avoid setting of enabled channels which are not used.

Totals from 52404 (38.38% of 136546) affected shaders (NAVI):
SGPRs: 3097443 -> 3097435 (-0.00%)
CodeSize: 189151600 -> 188546200 (-0.32%)
Instrs: 36445061 -> 36445104 (+0.00%); split: -0.00%, +0.00%
Cycles: 1739388020 -> 1739388192 (+0.00%); split: -0.00%, +0.00%
VMEM: 21071501 -> 21071665 (+0.00%); split: +0.00%, -0.00%
SMEM: 3470983 -> 3470982 (-0.00%); split: +0.00%, -0.00%
PreSGPRs: 2058965 -> 2058962 (-0.00%)
PreVGPRs: 1860294 -> 1860295 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
7240edec2a aco: use VOP2 version of v_cvt_pkrtz_f16_f32 on GFX_6_7_10
Totals from 767 (0.56% of 136546) affected shaders (NAVI):
CodeSize: 2862208 -> 2850036 (-0.43%)
Instrs: 561572 -> 561574 (+0.00%)
Cycles: 6455420 -> 6455428 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
2f125908b3 radv,aco: lower_pack_half_2x16
This patch also optimizes pack_half_2x16(a, 0.0).

Totals from 1949 (1.43% of 136546) affected shaders (RAVEN):
SGPRs: 83376 -> 83336 (-0.05%)
CodeSize: 3532144 -> 3512352 (-0.56%)
Instrs: 660746 -> 660682 (-0.01%); split: -0.01%, +0.00%
Cycles: 6780716 -> 6780472 (-0.00%); split: -0.00%, +0.00%
VMEM: 990886 -> 990883 (-0.00%); split: +0.00%, -0.00%
SMEM: 150506 -> 150538 (+0.02%); split: +0.05%, -0.03%
SClause: 30595 -> 30594 (-0.00%); split: -0.01%, +0.00%
Copies: 40801 -> 40729 (-0.18%)
PreSGPRs: 52335 -> 52341 (+0.01%); split: -0.03%, +0.04%
PreVGPRs: 45104 -> 45097 (-0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
dae1e6f756 aco: use v_cvt_pkrtz_f16_f32 for pack_half_2x16
Apparently, we forgot to remove some debug code.
This patch also fixes the round mode check to consider
the destination bit width.

Totals from 2218 (1.62% of 136546) affected shaders (RAVEN):
SGPRs: 100848 -> 100280 (-0.56%)
VGPRs: 68536 -> 66044 (-3.64%); split: -3.68%, +0.05%
CodeSize: 4882296 -> 4837220 (-0.92%); split: -0.94%, +0.01%
MaxWaves: 18990 -> 19019 (+0.15%); split: +0.19%, -0.04%
Instrs: 938150 -> 930388 (-0.83%); split: -0.83%, +0.00%
Cycles: 8699824 -> 8667648 (-0.37%); split: -0.38%, +0.01%
VMEM: 1144502 -> 1059680 (-7.41%); split: +0.06%, -7.48%
SMEM: 170076 -> 167999 (-1.22%); split: +0.22%, -1.44%
VClause: 18428 -> 18422 (-0.03%)
SClause: 41375 -> 41353 (-0.05%); split: -0.06%, +0.00%
Copies: 60008 -> 60054 (+0.08%); split: -0.31%, +0.39%
PreVGPRs: 56163 -> 56142 (-0.04%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
aec872cda0 aco: use p_split_vector for nir_op_unpack_half_*
This enables the use of SDWA if possible

Totals from 9933 (7.27% of 136546) affected shaders (RAVEN):
VGPRs: 731764 -> 731772 (+0.00%); split: -0.00%, +0.00%
CodeSize: 90944852 -> 90671472 (-0.30%); split: -0.30%, +0.00%
Instrs: 17881885 -> 17867831 (-0.08%); split: -0.08%, +0.00%
Cycles: 1597904072 -> 1597771260 (-0.01%); split: -0.01%, +0.00%
VMEM: 1702328 -> 1697383 (-0.29%); split: +0.13%, -0.42%
SMEM: 659583 -> 659049 (-0.08%); split: +0.01%, -0.09%
VClause: 318024 -> 318025 (+0.00%); split: -0.00%, +0.00%
SClause: 631670 -> 631707 (+0.01%); split: -0.01%, +0.01%
Copies: 1504107 -> 1504626 (+0.03%); split: -0.01%, +0.04%
PreVGPRs: 683153 -> 683180 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Daniel Schürmann
a38a497b86 aco: use p_create_vector for nir_op_pack_half_2x16
This enables the use of SDWA if possible

Totals from 2218 (1.62% of 136546) affected shaders (RAVEN):
VGPRs: 68508 -> 68516 (+0.01%)
CodeSize: 4897024 -> 4881068 (-0.33%); split: -0.33%, +0.00%
MaxWaves: 18992 -> 18990 (-0.01%)
Instrs: 946942 -> 939161 (-0.82%); split: -0.82%, +0.00%
Cycles: 8737668 -> 8705704 (-0.37%); split: -0.37%, +0.00%
VMEM: 1155362 -> 1145245 (-0.88%); split: +0.00%, -0.88%
SMEM: 170435 -> 170165 (-0.16%); split: +0.01%, -0.16%
VClause: 18426 -> 18425 (-0.01%)
SClause: 41376 -> 41375 (-0.00%)
Copies: 59813 -> 59787 (-0.04%); split: -0.15%, +0.10%
PreVGPRs: 56126 -> 56136 (+0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Rhys Perry
c122315702 aco: fix get_ssbo_size with a vgpr resource
The result of load_vulkan_descriptor is passed directly to get_ssbo_size.
This caused convert_pointer_to_64_bit() to skip creating a
v_readfirstlane_b32 if it was necessary.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 05b6612b4e ('radv: do not lower UBO/SSBO access to offsets')
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3628
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7095>
2020-10-13 14:20:28 +00:00
Rhys Perry
bb5c0ba0d2 aco: implement last_invocation
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
2020-10-13 12:47:21 +00:00