Rhys Perry
aab507c6b0
aco: use v_mul_imm() for some nir_op_imul
...
Some of the optimizations v_mul_imm() does are complex and very
target-specific and not suitable to do in ACO's optimizer.
fossil-db (Vega):
Totals from 49135 (35.76% of 137413) affected shaders:
SGPRs: 2698547 -> 2696103 (-0.09%); split: -0.16%, +0.07%
VGPRs: 2301412 -> 2301600 (+0.01%); split: -0.01%, +0.02%
SpillSGPRs: 51520 -> 51519 (-0.00%)
CodeSize: 168798572 -> 169164012 (+0.22%); split: -0.00%, +0.22%
MaxWaves: 306553 -> 306539 (-0.00%); split: +0.00%, -0.01%
Instrs: 33423982 -> 33506598 (+0.25%); split: -0.00%, +0.25%
Cycles: 1807800632 -> 1804101376 (-0.20%); split: -0.20%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5390 >
2020-11-20 19:50:32 +00:00
Tony Wasserka
2bb8874320
aco: Fix -Wshadow warnings
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7430 >
2020-11-20 09:29:19 +00:00
Rhys Perry
867323379e
aco: don't use SMEM for SSBO stores
...
fossil-db (Navi):
Totals from 70 (0.05% of 138791) affected shaders:
SGPRs: 2324 -> 2097 (-9.77%)
VGPRs: 1344 -> 1480 (+10.12%)
CodeSize: 157872 -> 154836 (-1.92%); split: -1.93%, +0.01%
MaxWaves: 1288 -> 1260 (-2.17%)
Instrs: 29730 -> 29108 (-2.09%); split: -2.13%, +0.04%
Cycles: 394944 -> 391280 (-0.93%); split: -0.94%, +0.01%
VMEM: 5288 -> 5695 (+7.70%); split: +11.97%, -4.27%
SMEM: 2680 -> 2444 (-8.81%); split: +1.34%, -10.15%
VClause: 291 -> 502 (+72.51%)
SClause: 1176 -> 918 (-21.94%)
Copies: 3549 -> 3517 (-0.90%); split: -1.80%, +0.90%
Branches: 1230 -> 1228 (-0.16%)
PreSGPRs: 1675 -> 1491 (-10.99%)
PreVGPRs: 1101 -> 1223 (+11.08%)
Totals from 70 (0.05% of 139517) affected shaders (RAVEN):
SGPRs: 2368 -> 2121 (-10.43%)
VGPRs: 1344 -> 1480 (+10.12%)
CodeSize: 156664 -> 153252 (-2.18%)
MaxWaves: 636 -> 622 (-2.20%)
Instrs: 29968 -> 29226 (-2.48%)
Cycles: 398284 -> 393492 (-1.20%)
VMEM: 5544 -> 5930 (+6.96%); split: +11.72%, -4.76%
SMEM: 2752 -> 2502 (-9.08%); split: +1.20%, -10.28%
VClause: 292 -> 504 (+72.60%)
SClause: 1236 -> 940 (-23.95%)
Copies: 3907 -> 3852 (-1.41%); split: -2.20%, +0.79%
Branches: 1230 -> 1228 (-0.16%)
PreSGPRs: 1671 -> 1487 (-11.01%)
PreVGPRs: 1102 -> 1225 (+11.16%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6143 >
2020-11-16 15:52:22 +00:00
Samuel Pitoiset
20e48551ac
aco: select v_mul_lo_u16 for 16-bit multiplications that can't overflow
...
Only on GFX8-9 because GFX10 doesn't zero the upper 16 bits.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7425 >
2020-11-12 12:32:26 +00:00
Samuel Pitoiset
7028e9875f
aco: select v_mad_u32_u16 for 16-bit multiplications on GFX9+
...
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7425 >
2020-11-12 12:32:26 +00:00
Rhys Perry
5b81e80fb6
aco: implement 64-bit images
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7234 >
2020-11-09 18:28:59 +00:00
Jason Ekstrand
21b1b91549
nir,spirv: Add support for the ShaderCallKHR scope
...
It's currently entirely trivial.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479 >
2020-11-05 23:36:46 +00:00
Rhys Perry
786828131a
aco: implement 8/16-bit instructions which can be trivially widened
...
When nir_lower_bit_size becomes more capable, we might want to revert some
of this.
fossil-db (parallel-rdp, Navi):
Totals from 217 (31.77% of 683) affected shaders:
SGPRs: 11320 -> 10200 (-9.89%)
VGPRs: 7156 -> 7364 (+2.91%)
CodeSize: 1453948 -> 1430136 (-1.64%); split: -1.66%, +0.02%
Instrs: 258530 -> 254840 (-1.43%); split: -1.44%, +0.01%
Cycles: 37334360 -> 37247936 (-0.23%); split: -0.26%, +0.03%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791 >
2020-11-04 11:50:37 +00:00
Rhys Perry
ef95ba8cdd
aco: implement some 16-bit arithmetic instead of lowering
...
fossil-db (parallel-rdp, Navi):
Totals from 210 (30.75% of 683) affected shaders:
SGPRs: 9704 -> 10248 (+5.61%)
VGPRs: 5884 -> 5368 (-8.77%)
CodeSize: 1155564 -> 1098752 (-4.92%)
Instrs: 199927 -> 189940 (-5.00%)
Cycles: 20438392 -> 19860124 (-2.83%)
v2: use divergence analysis to determine which instructions to lower.
Co-Authored-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791 >
2020-11-04 11:50:37 +00:00
Samuel Pitoiset
57c152af9c
aco: select v_mul_{hi}_u32_u24 for 24-bit multiplications
...
This is based on the NIR range analysis. v_mul_u32_u24 is VOP2, while
v_mul_lo_u32 is VOP3, so that should reduce codesize.
fossils-db (Vega10):
Totals from 12590 (9.22% of 136546) affected shaders:
SGPRs: 680207 -> 677271 (-0.43%); split: -0.47%, +0.04%
VGPRs: 620840 -> 620856 (+0.00%); split: -0.02%, +0.02%
CodeSize: 37930200 -> 37774088 (-0.41%); split: -0.41%, +0.00%
Instrs: 7463550 -> 7458120 (-0.07%); split: -0.07%, +0.00%
Cycles: 133487628 -> 133427532 (-0.05%); split: -0.05%, +0.00%
VMEM: 2514729 -> 2513426 (-0.05%); split: +0.02%, -0.08%
SMEM: 1533579 -> 1532795 (-0.05%); split: +0.05%, -0.10%
VClause: 231391 -> 231389 (-0.00%); split: -0.01%, +0.00%
SClause: 255352 -> 255294 (-0.02%); split: -0.04%, +0.02%
Copies: 605821 -> 600352 (-0.90%); split: -0.92%, +0.02%
Branches: 133739 -> 133743 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 351092 -> 348048 (-0.87%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405 >
2020-11-03 13:47:40 +00:00
Samuel Pitoiset
3a72021d7c
aco: store NIR range analysis data to the isel context
...
It will be used to optimize some ALU instructions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405 >
2020-11-03 13:47:40 +00:00
James Park
4bd18e772a
amd/llvm,aco: Replace VLA with alloca
...
MSVC will never support VLA, so use alloca instead.
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7157 >
2020-11-03 07:44:02 +00:00
Samuel Pitoiset
03f260cb27
radv,aco: optimize computing the sample mask for per-sample shading
...
I don't know why these values were introduced for but it seems like
we can optimize this by just doing:
gl_SampleMaskIn[0] = (SampleCoverage & (1 << gl_SampleID))
AMDGPU-PRO and AMDVLK apply the same formula to compute the
sample mask when per-sample shading is enabled.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377 >
2020-11-02 08:05:47 +01:00
Samuel Pitoiset
c63bcda22c
radv,aco: adjust the sample mask only if per-sample shading is enabled
...
When per-sample shading isn't enabled, we can just load the
samplemask from the hardware which is always the coverage of
the entire pixel/fragment.
fossilds-db (VEGA10):
Totals from 131 (0.10% of 136546) affected shaders:
SGPRs: 5056 -> 5048 (-0.16%)
VGPRs: 2600 -> 2372 (-8.77%)
CodeSize: 115788 -> 112560 (-2.79%)
MaxWaves: 1266 -> 1274 (+0.63%)
Instrs: 20620 -> 20071 (-2.66%)
Cycles: 82416 -> 80220 (-2.66%)
VMEM: 51567 -> 35532 (-31.10%); split: +0.24%, -31.34%
SMEM: 8952 -> 8258 (-7.75%); split: +0.11%, -7.86%
SClause: 1223 -> 1199 (-1.96%); split: -2.62%, +0.65%
Copies: 1247 -> 1124 (-9.86%); split: -10.18%, +0.32%
PreVGPRs: 2112 -> 1981 (-6.20%)
Helps Britannia, Shadow of the Tomb Raider, Warhammer II and Control.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377 >
2020-11-02 08:05:43 +01:00
Daniel Schürmann
f4c090a3b3
aco: refactor split_store_data() to always split into evenly sized elements
...
This fixes a couple of issues on GFX67 and
has no negative impact on newer hardware
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7105 >
2020-10-29 14:32:59 +00:00
Timur Kristóf
09b9e52c0d
aco/ngg: Export a zero-area triangle when primitive count is 0.
...
This is a workaround for a bug in Navi 1x NGG HW.
Very rarely, the Navi 1x PA can hang when an NGG workgroup exports
0 total primitives. According to AMD, we always need this workaround
when it is possible that the number of primitives is 0.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:55:47 +01:00
Timur Kristóf
b6654adc0e
aco: Make emitting reduction instructions a bit more convenient.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:47:22 +01:00
Timur Kristóf
260f9c503a
aco/ngg: Put shader query reduction operand into a VGPR.
...
The p_reduce instruction only works if this operand is in a VGPR,
and otherwise gets lowered to incorrect code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:47:22 +01:00
Timur Kristóf
9757c3cb6b
aco: Assert that workgroup barriers are not used inappropriately.
...
Example:
It is possible for some NGG GS waves to have 0 ES and/or GS invocations,
and in that case having an s_barrier inside divergent control flow can
very possibly hang the GPU.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:47:19 +01:00
Rhys Perry
483657de32
aco: use mubuf helper in select_gs_copy_shader
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103 >
2020-10-28 14:59:49 +00:00
Rhys Perry
ec7ecfe9cb
aco: use control flow creation helpers in select_gs_copy_shader
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103 >
2020-10-28 14:59:49 +00:00
Daniel Schürmann
543f50789a
aco: implement nir_op_unpack_[64/32]_*
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527 >
2020-10-28 10:14:26 +00:00
Rhys Perry
26e53e3afa
aco: ignore the ACO-inserted continue in create_continue_phis()
...
Otherwise, for loops without continue_or_break, create_continue_phis()
always returns an undef operand.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 638cbc21a1 ("aco: handle when ACO adds new continue edges")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2848
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7148 >
2020-10-27 19:53:38 +00:00
Rhys Perry
437995bb70
aco: remove all-undef phi opt
...
This doesn't look like it would create correct IR for 8/16-bit phis and
doesn't seem to help anything. If we ever want to do this, it's probably
better done in nir_opt_remove_phis().
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216 >
2020-10-27 15:24:38 +00:00
Rhys Perry
d20a752c0d
aco: use Builder::copy more
...
fossil-db (Navi):
Totals from 6973 (5.07% of 137413) affected shaders:
SGPRs: 381768 -> 381776 (+0.00%)
VGPRs: 306092 -> 306096 (+0.00%); split: -0.00%, +0.00%
CodeSize: 24440844 -> 24421196 (-0.08%); split: -0.09%, +0.01%
MaxWaves: 86581 -> 86583 (+0.00%)
Instrs: 4682161 -> 4679578 (-0.06%); split: -0.06%, +0.00%
Cycles: 68793116 -> 68261648 (-0.77%); split: -0.83%, +0.05%
fossil-db (Polaris):
Totals from 8154 (5.87% of 138881) affected shaders:
VGPRs: 338916 -> 338920 (+0.00%); split: -0.00%, +0.00%
CodeSize: 23540428 -> 23540488 (+0.00%); split: -0.00%, +0.00%
MaxWaves: 49090 -> 49091 (+0.00%)
Instrs: 4576085 -> 4576101 (+0.00%); split: -0.00%, +0.00%
Cycles: 51720704 -> 51720888 (+0.00%); split: -0.00%, +0.00%
Most of the Navi cycle/instruction changes are from 8/16-bit parallel-rdp
shaders. They appear to be improved because the p_create_vector from
lower_subdword_phis() was blocking constant propagation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216 >
2020-10-27 15:24:38 +00:00
Rhys Perry
72b307a338
aco: don't do divergent break+discard
...
If the shader does:
loop {
if (divergent)
discard
else
a()
b()
}
then a()'s block will dominate b()'s block in the logical CFG, but not the
linear CFG. This will cause value numbering to try to combine SLAU from
a() and b().
This didn't happen with break/continue because sanitize_if() would move
a() out of the branch. Using sanitize_if() to fix this doesn't look easy,
because discards are not control flow instructions in NIR.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216 >
2020-10-27 15:24:38 +00:00
Rhys Perry
27ce5d921e
aco: remove isel_context::allocated
...
Now that we have Program::temp_rc, we can replace it with the first
temporary id allocated for NIR's ssa defs.
No fossil-db changes on Navi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7067 >
2020-10-26 15:14:32 +00:00
Samuel Pitoiset
4e2fe34aa9
aco: fix determining if LOD is zero for nir_texop_txf/nir_texop_txs
...
txf/txs expects LOD to be a 32-bit unsigned integer while other
texture operations expects a float.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3668
Fixes: 93c8ebfa78 ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7256 >
2020-10-22 11:30:43 +00:00
Samuel Pitoiset
eb6877d3af
radv,aco: fix use of texop_samples_identical in the resolve meta path
...
The return value of this texture intrinsic should be a NIR 1-bit bool.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7236 >
2020-10-21 13:06:53 +02:00
Tony Wasserka
fd038132de
aco/isel: Miscellaneous cleanups using the new Stage API
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094 >
2020-10-21 09:49:38 +00:00
Tony Wasserka
34bc9477de
aco: Clean up symbol names and comments related to NGG
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094 >
2020-10-21 09:49:38 +00:00
Tony Wasserka
86c227c10c
aco: Use strong typing to model SW<->HW stage mappings
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094 >
2020-10-21 09:49:38 +00:00
Bas Nieuwenhuizen
76421667ec
aco: Add VK_KHR_shader_terminate_invocation support.
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7226 >
2020-10-20 22:53:08 +00:00
Timur Kristóf
d8435c1628
aco/ngg: Add assertion to make sure we always know the vertex count.
...
Just a sanity check to avoid hangs caused by missing this
in the future.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7213 >
2020-10-20 07:11:29 +00:00
James Park
af8d488ea5
util,ac,aco,radv: Cross-platform memstream API
...
POSIX memstream is not available on Windows.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7143 >
2020-10-19 03:37:42 -07:00
Rhys Perry
fdb65b8b23
aco: add missing SCC clobber in get_buffer_size
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: fcd6d83245 ("aco: fix imageSize()/textureSize() with large buffers on GFX8")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7162 >
2020-10-15 21:11:45 +00:00
Tony Wasserka
d5a72319d6
aco/isel: Remove now unused VS-related code from create_null_export
...
Also replaced a hardcoded constant with the appropriate register macro.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102 >
2020-10-14 16:22:51 +00:00
Tony Wasserka
c22c702f35
aco/isel: Remove some dead code
...
exported_pos was always initialized to true (due to the is_pos argument
of the first export_vs_varying call being true), so none of this code has
any effect.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102 >
2020-10-14 16:22:51 +00:00
Tony Wasserka
bf51b11c04
aco/isel: Always export position data from VS/NGG
...
AMD ISA docs explicitly require this for VS, and this likely extends to
NGG too.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3615
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7102 >
2020-10-14 16:22:51 +00:00
Daniel Schürmann
f29c81f863
aco: use VOP2 for v_cvt_pkrtz_f16_f32 if possible
...
This patch also does a slight rework of export_fs_mrt_color()
to avoid setting of enabled channels which are not used.
Totals from 52404 (38.38% of 136546) affected shaders (NAVI):
SGPRs: 3097443 -> 3097435 (-0.00%)
CodeSize: 189151600 -> 188546200 (-0.32%)
Instrs: 36445061 -> 36445104 (+0.00%); split: -0.00%, +0.00%
Cycles: 1739388020 -> 1739388192 (+0.00%); split: -0.00%, +0.00%
VMEM: 21071501 -> 21071665 (+0.00%); split: +0.00%, -0.00%
SMEM: 3470983 -> 3470982 (-0.00%); split: +0.00%, -0.00%
PreSGPRs: 2058965 -> 2058962 (-0.00%)
PreVGPRs: 1860294 -> 1860295 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777 >
2020-10-14 15:31:38 +00:00
Daniel Schürmann
7240edec2a
aco: use VOP2 version of v_cvt_pkrtz_f16_f32 on GFX_6_7_10
...
Totals from 767 (0.56% of 136546) affected shaders (NAVI):
CodeSize: 2862208 -> 2850036 (-0.43%)
Instrs: 561572 -> 561574 (+0.00%)
Cycles: 6455420 -> 6455428 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777 >
2020-10-14 15:31:38 +00:00
Daniel Schürmann
2f125908b3
radv,aco: lower_pack_half_2x16
...
This patch also optimizes pack_half_2x16(a, 0.0).
Totals from 1949 (1.43% of 136546) affected shaders (RAVEN):
SGPRs: 83376 -> 83336 (-0.05%)
CodeSize: 3532144 -> 3512352 (-0.56%)
Instrs: 660746 -> 660682 (-0.01%); split: -0.01%, +0.00%
Cycles: 6780716 -> 6780472 (-0.00%); split: -0.00%, +0.00%
VMEM: 990886 -> 990883 (-0.00%); split: +0.00%, -0.00%
SMEM: 150506 -> 150538 (+0.02%); split: +0.05%, -0.03%
SClause: 30595 -> 30594 (-0.00%); split: -0.01%, +0.00%
Copies: 40801 -> 40729 (-0.18%)
PreSGPRs: 52335 -> 52341 (+0.01%); split: -0.03%, +0.04%
PreVGPRs: 45104 -> 45097 (-0.02%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777 >
2020-10-14 15:31:38 +00:00
Daniel Schürmann
dae1e6f756
aco: use v_cvt_pkrtz_f16_f32 for pack_half_2x16
...
Apparently, we forgot to remove some debug code.
This patch also fixes the round mode check to consider
the destination bit width.
Totals from 2218 (1.62% of 136546) affected shaders (RAVEN):
SGPRs: 100848 -> 100280 (-0.56%)
VGPRs: 68536 -> 66044 (-3.64%); split: -3.68%, +0.05%
CodeSize: 4882296 -> 4837220 (-0.92%); split: -0.94%, +0.01%
MaxWaves: 18990 -> 19019 (+0.15%); split: +0.19%, -0.04%
Instrs: 938150 -> 930388 (-0.83%); split: -0.83%, +0.00%
Cycles: 8699824 -> 8667648 (-0.37%); split: -0.38%, +0.01%
VMEM: 1144502 -> 1059680 (-7.41%); split: +0.06%, -7.48%
SMEM: 170076 -> 167999 (-1.22%); split: +0.22%, -1.44%
VClause: 18428 -> 18422 (-0.03%)
SClause: 41375 -> 41353 (-0.05%); split: -0.06%, +0.00%
Copies: 60008 -> 60054 (+0.08%); split: -0.31%, +0.39%
PreVGPRs: 56163 -> 56142 (-0.04%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777 >
2020-10-14 15:31:38 +00:00
Daniel Schürmann
aec872cda0
aco: use p_split_vector for nir_op_unpack_half_*
...
This enables the use of SDWA if possible
Totals from 9933 (7.27% of 136546) affected shaders (RAVEN):
VGPRs: 731764 -> 731772 (+0.00%); split: -0.00%, +0.00%
CodeSize: 90944852 -> 90671472 (-0.30%); split: -0.30%, +0.00%
Instrs: 17881885 -> 17867831 (-0.08%); split: -0.08%, +0.00%
Cycles: 1597904072 -> 1597771260 (-0.01%); split: -0.01%, +0.00%
VMEM: 1702328 -> 1697383 (-0.29%); split: +0.13%, -0.42%
SMEM: 659583 -> 659049 (-0.08%); split: +0.01%, -0.09%
VClause: 318024 -> 318025 (+0.00%); split: -0.00%, +0.00%
SClause: 631670 -> 631707 (+0.01%); split: -0.01%, +0.01%
Copies: 1504107 -> 1504626 (+0.03%); split: -0.01%, +0.04%
PreVGPRs: 683153 -> 683180 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777 >
2020-10-14 15:31:38 +00:00
Daniel Schürmann
a38a497b86
aco: use p_create_vector for nir_op_pack_half_2x16
...
This enables the use of SDWA if possible
Totals from 2218 (1.62% of 136546) affected shaders (RAVEN):
VGPRs: 68508 -> 68516 (+0.01%)
CodeSize: 4897024 -> 4881068 (-0.33%); split: -0.33%, +0.00%
MaxWaves: 18992 -> 18990 (-0.01%)
Instrs: 946942 -> 939161 (-0.82%); split: -0.82%, +0.00%
Cycles: 8737668 -> 8705704 (-0.37%); split: -0.37%, +0.00%
VMEM: 1155362 -> 1145245 (-0.88%); split: +0.00%, -0.88%
SMEM: 170435 -> 170165 (-0.16%); split: +0.01%, -0.16%
VClause: 18426 -> 18425 (-0.01%)
SClause: 41376 -> 41375 (-0.00%)
Copies: 59813 -> 59787 (-0.04%); split: -0.15%, +0.10%
PreVGPRs: 56126 -> 56136 (+0.02%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777 >
2020-10-14 15:31:38 +00:00
Rhys Perry
c122315702
aco: fix get_ssbo_size with a vgpr resource
...
The result of load_vulkan_descriptor is passed directly to get_ssbo_size.
This caused convert_pointer_to_64_bit() to skip creating a
v_readfirstlane_b32 if it was necessary.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 05b6612b4e ('radv: do not lower UBO/SSBO access to offsets')
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3628
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7095 >
2020-10-13 14:20:28 +00:00
Rhys Perry
bb5c0ba0d2
aco: implement last_invocation
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:21 +00:00
Rhys Perry
36da9c4aa2
aco: implement elect
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:20 +00:00
Rhys Perry
bf77f539ee
aco: optimize more uniform reductions/scans
...
Uniform atomic optimization will create these.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:20 +00:00
Samuel Pitoiset
b9ca4923d6
aco: implement missing nir_op_unpack_half_2x16_split_{x,y}_flush_to_zero
...
SPIRV->NIR emits nir_op_unpack_half_2x16_flush_to_zero instead of
nir_op_unpack_half_2x16 if the shader enables denorm flush to zero
for 16-bit floating point.
This doesn't fix anything known and CTS doesn't have tests.
Fixes: 56d9bcdded ("radv: enable more float_controls features")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6939 >
2020-10-13 08:35:22 +02:00