Rhys Perry
f4e649a205
aco: fix various s_subb_u32 operands to SCC
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8007 >
2020-12-09 19:06:34 +00:00
Jonathan Gray
ebfb9e1817
aco: use UINT64_C on 64 bit constant arguments
...
avoids errors seen when building on OpenBSD/amd64
../src/amd/compiler/aco_instruction_selection.cpp:1677:62: error: ambiguous conversion for functional-style cast from 'unsigned long' to 'aco::Operand'
bld.vop3(aco_opcode::v_mul_f64, Definition(dst), Operand(0x3FF0000000000000lu), tmp);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
glibc uses unsigned long for uint64_t on LP64 archs and unsigned long long for
uint64_t on ILP32 archs. On OpenBSD unsigned long long is used for uint64_t
on all archs.
The Operand constructors are uint8_t uint16_t uint32_t uint64_t
use UINT64_C so lu or llu suffix will be used as needed.
Fixes: df645fa369 ("aco: implement VK_KHR_shader_float_controls")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7944 >
2020-12-08 11:11:28 +00:00
Samuel Pitoiset
562dd79bfa
radv: fix using FS sample shading if the linker optimized inputs away
...
During NIR linking, constant varyings might be moved to the next
stage and the sample qualifier removed.
shader_info::uses_sample_shading remembers if the sample qualifier
was used before optimizations.
No fossils-db changes on Sienna Cichlid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7892 >
2020-12-07 11:42:17 +00:00
Timur Kristóf
94f8cb29ee
aco: Fix NGG GS assert failure from the WG scan.
...
There was a temp which was defined in a branch but used outside,
without a phi.
Fixes: 62b5012ec3
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7817 >
2020-12-01 15:04:58 +00:00
Samuel Pitoiset
3a858ecd40
Revert "radv/llvm,aco: always split typed vertex buffer loads on GFX6 and GFX10+"
...
It introduces regressions.
This reverts commit 6fb4babfe9 .
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7852 >
2020-12-01 14:31:16 +01:00
James Park
93094b8c5e
aco: Remove nonstandard parentheses
...
Remove parentheses in cases where a parenthesized type is followed by an
initializer list.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7785 >
2020-12-01 11:08:21 +00:00
James Park
d1f742e497
aco: Add missing C++ includes
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7785 >
2020-12-01 11:08:21 +00:00
James Park
e352ebf88e
aco: Fix warnings about unsafe integer/bool mix
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7785 >
2020-12-01 11:08:21 +00:00
Rhys Perry
2d12991e01
aco: use FALLTHROUGH macro
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7844 >
2020-12-01 10:28:36 +00:00
Samuel Pitoiset
6fb4babfe9
radv/llvm,aco: always split typed vertex buffer loads on GFX6 and GFX10+
...
To avoid any alignment issues that triggers memory violations and
eventually a GPU. This can happen if the stride (static or dynamic)
is unaligned and also if the VBO offset is aligned to scalar
(eg. stride is 8 and VBO offset is 2 for R16G16B16A16_SNORM).
The AMD Windows driver also always splits typed vertex fetches.
fossils-db (Sienna Cichlid):
Totals from 56508 (40.54% of 139391) affected shaders:
SGPRs: 2643545 -> 2664516 (+0.79%); split: -0.19%, +0.98%
VGPRs: 2007472 -> 1995408 (-0.60%); split: -0.74%, +0.13%
CodeSize: 70596372 -> 73913312 (+4.70%); split: -0.00%, +4.70%
MaxWaves: 772653 -> 774916 (+0.29%); split: +0.37%, -0.08%
Instrs: 14074162 -> 14567072 (+3.50%); split: -0.00%, +3.51%
Cycles: 69281276 -> 71253252 (+2.85%); split: -0.00%, +2.85%
VMEM: 22047039 -> 25554196 (+15.91%); split: +17.20%, -1.29%
SMEM: 4120370 -> 4360820 (+5.84%); split: +7.41%, -1.58%
VClause: 416913 -> 438361 (+5.14%); split: -1.86%, +7.01%
SClause: 536739 -> 542637 (+1.10%); split: -0.33%, +1.43%
Copies: 977194 -> 970015 (-0.73%); split: -2.43%, +1.69%
Branches: 241205 -> 241193 (-0.00%); split: -0.06%, +0.06%
PreVGPRs: 1505645 -> 1505379 (-0.02%)
This fixes GPU hangs with bin/draw-vertices from Piglit on GFX10+
with Zink.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7751 >
2020-12-01 10:14:27 +00:00
Rhys Perry
5cf41814cd
aco: use binding chasing helpers
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7291 >
2020-11-27 13:07:07 +00:00
Rhys Perry
37a2c9ace6
aco: fix GS with no outputs
...
With NGG, ngg_gs_known_vtxcnt[0] would be false and ngg_gs_finale() would
assert.
With legacy GS, I don't know why the assertion was there.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7576 >
2020-11-25 13:41:04 +00:00
Rhys Perry
fdfa96561e
radv/llvm,aco/ngg: fix large shift exponent in ngg_gs_vertex_lds_addr
...
When vertices_out=0, we will try to shift 1u by UINT32_MAX.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7576 >
2020-11-25 13:41:04 +00:00
Rhys Perry
cf0b54cdc1
aco: fix v_mul_hi_u32_u24 format
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 57c152af9c ("aco: select v_mul_{hi}_u32_u24 for 24-bit multiplications")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3874
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7759 >
2020-11-25 10:59:44 +00:00
Samuel Pitoiset
8e961b91c3
aco: optimize v_add+v_lshlrev to v_mad_u32_u24 on GFX6-8
...
This optimizes v_add(c, v_lshlrev(a, b)) to v_mad_u32_u24(b, 1<<a, c)
if 'a' is a constant (less than or equal to 6 to avoid creating
literals) and 'b' known to be a 16-bit or a 24-bit value.
On GFX9+, this is already optimized to v_lshl_add_u32.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7673 >
2020-11-23 18:34:40 +00:00
Samuel Pitoiset
d9e4504b0d
aco: optimize v_add+s_lshl to v_mad_u32_u24 on GFX6-8
...
This optimizes v_add(c, s_lshl(a, b)) to v_mad_u32_u24(a, 1<<b, c)
if 'b' is a constant (less than or equal to 6 to avoid creating
literals) and 'a' known to be a 16-bit or a 24-bit value.
On GFX9+, this is already optimized to v_lshl_add_u32.
fossils-db (Polaris10):
Totals from 1916 (1.36% of 140385) affected shaders:
SGPRs: 88322 -> 87780 (-0.61%); split: -0.66%, +0.05%
CodeSize: 7852668 -> 7851800 (-0.01%); split: -0.01%, +0.00%
Instrs: 1533965 -> 1530459 (-0.23%); split: -0.23%, +0.00%
Cycles: 57001852 -> 56983244 (-0.03%); split: -0.03%, +0.00%
VMEM: 372561 -> 371733 (-0.22%); split: +0.03%, -0.25%
SMEM: 108859 -> 103711 (-4.73%); split: +0.23%, -4.96%
VClause: 37231 -> 37204 (-0.07%)
SClause: 58116 -> 58086 (-0.05%); split: -0.06%, +0.01%
Copies: 199953 -> 199931 (-0.01%); split: -0.03%, +0.02%
Branches: 63478 -> 63477 (-0.00%)
PreSGPRs: 61818 -> 61816 (-0.00%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7673 >
2020-11-23 18:34:40 +00:00
Samuel Pitoiset
eaef1f2127
aco: allow to use the range analysis UB in emit_{sop2,vop2}_instruction()
...
It will allow to combine v_add+s_lshl or v_add+v_lshlrev to
v_mad_u32_u24 on GFX6-8 if operands are known to be 16-bit or 24-bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7673 >
2020-11-23 18:34:40 +00:00
Rhys Perry
aab507c6b0
aco: use v_mul_imm() for some nir_op_imul
...
Some of the optimizations v_mul_imm() does are complex and very
target-specific and not suitable to do in ACO's optimizer.
fossil-db (Vega):
Totals from 49135 (35.76% of 137413) affected shaders:
SGPRs: 2698547 -> 2696103 (-0.09%); split: -0.16%, +0.07%
VGPRs: 2301412 -> 2301600 (+0.01%); split: -0.01%, +0.02%
SpillSGPRs: 51520 -> 51519 (-0.00%)
CodeSize: 168798572 -> 169164012 (+0.22%); split: -0.00%, +0.22%
MaxWaves: 306553 -> 306539 (-0.00%); split: +0.00%, -0.01%
Instrs: 33423982 -> 33506598 (+0.25%); split: -0.00%, +0.25%
Cycles: 1807800632 -> 1804101376 (-0.20%); split: -0.20%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5390 >
2020-11-20 19:50:32 +00:00
Tony Wasserka
2bb8874320
aco: Fix -Wshadow warnings
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7430 >
2020-11-20 09:29:19 +00:00
Rhys Perry
867323379e
aco: don't use SMEM for SSBO stores
...
fossil-db (Navi):
Totals from 70 (0.05% of 138791) affected shaders:
SGPRs: 2324 -> 2097 (-9.77%)
VGPRs: 1344 -> 1480 (+10.12%)
CodeSize: 157872 -> 154836 (-1.92%); split: -1.93%, +0.01%
MaxWaves: 1288 -> 1260 (-2.17%)
Instrs: 29730 -> 29108 (-2.09%); split: -2.13%, +0.04%
Cycles: 394944 -> 391280 (-0.93%); split: -0.94%, +0.01%
VMEM: 5288 -> 5695 (+7.70%); split: +11.97%, -4.27%
SMEM: 2680 -> 2444 (-8.81%); split: +1.34%, -10.15%
VClause: 291 -> 502 (+72.51%)
SClause: 1176 -> 918 (-21.94%)
Copies: 3549 -> 3517 (-0.90%); split: -1.80%, +0.90%
Branches: 1230 -> 1228 (-0.16%)
PreSGPRs: 1675 -> 1491 (-10.99%)
PreVGPRs: 1101 -> 1223 (+11.08%)
Totals from 70 (0.05% of 139517) affected shaders (RAVEN):
SGPRs: 2368 -> 2121 (-10.43%)
VGPRs: 1344 -> 1480 (+10.12%)
CodeSize: 156664 -> 153252 (-2.18%)
MaxWaves: 636 -> 622 (-2.20%)
Instrs: 29968 -> 29226 (-2.48%)
Cycles: 398284 -> 393492 (-1.20%)
VMEM: 5544 -> 5930 (+6.96%); split: +11.72%, -4.76%
SMEM: 2752 -> 2502 (-9.08%); split: +1.20%, -10.28%
VClause: 292 -> 504 (+72.60%)
SClause: 1236 -> 940 (-23.95%)
Copies: 3907 -> 3852 (-1.41%); split: -2.20%, +0.79%
Branches: 1230 -> 1228 (-0.16%)
PreSGPRs: 1671 -> 1487 (-11.01%)
PreVGPRs: 1102 -> 1225 (+11.16%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6143 >
2020-11-16 15:52:22 +00:00
Samuel Pitoiset
20e48551ac
aco: select v_mul_lo_u16 for 16-bit multiplications that can't overflow
...
Only on GFX8-9 because GFX10 doesn't zero the upper 16 bits.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7425 >
2020-11-12 12:32:26 +00:00
Samuel Pitoiset
7028e9875f
aco: select v_mad_u32_u16 for 16-bit multiplications on GFX9+
...
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7425 >
2020-11-12 12:32:26 +00:00
Rhys Perry
5b81e80fb6
aco: implement 64-bit images
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7234 >
2020-11-09 18:28:59 +00:00
Jason Ekstrand
21b1b91549
nir,spirv: Add support for the ShaderCallKHR scope
...
It's currently entirely trivial.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479 >
2020-11-05 23:36:46 +00:00
Rhys Perry
786828131a
aco: implement 8/16-bit instructions which can be trivially widened
...
When nir_lower_bit_size becomes more capable, we might want to revert some
of this.
fossil-db (parallel-rdp, Navi):
Totals from 217 (31.77% of 683) affected shaders:
SGPRs: 11320 -> 10200 (-9.89%)
VGPRs: 7156 -> 7364 (+2.91%)
CodeSize: 1453948 -> 1430136 (-1.64%); split: -1.66%, +0.02%
Instrs: 258530 -> 254840 (-1.43%); split: -1.44%, +0.01%
Cycles: 37334360 -> 37247936 (-0.23%); split: -0.26%, +0.03%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791 >
2020-11-04 11:50:37 +00:00
Rhys Perry
ef95ba8cdd
aco: implement some 16-bit arithmetic instead of lowering
...
fossil-db (parallel-rdp, Navi):
Totals from 210 (30.75% of 683) affected shaders:
SGPRs: 9704 -> 10248 (+5.61%)
VGPRs: 5884 -> 5368 (-8.77%)
CodeSize: 1155564 -> 1098752 (-4.92%)
Instrs: 199927 -> 189940 (-5.00%)
Cycles: 20438392 -> 19860124 (-2.83%)
v2: use divergence analysis to determine which instructions to lower.
Co-Authored-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791 >
2020-11-04 11:50:37 +00:00
Samuel Pitoiset
57c152af9c
aco: select v_mul_{hi}_u32_u24 for 24-bit multiplications
...
This is based on the NIR range analysis. v_mul_u32_u24 is VOP2, while
v_mul_lo_u32 is VOP3, so that should reduce codesize.
fossils-db (Vega10):
Totals from 12590 (9.22% of 136546) affected shaders:
SGPRs: 680207 -> 677271 (-0.43%); split: -0.47%, +0.04%
VGPRs: 620840 -> 620856 (+0.00%); split: -0.02%, +0.02%
CodeSize: 37930200 -> 37774088 (-0.41%); split: -0.41%, +0.00%
Instrs: 7463550 -> 7458120 (-0.07%); split: -0.07%, +0.00%
Cycles: 133487628 -> 133427532 (-0.05%); split: -0.05%, +0.00%
VMEM: 2514729 -> 2513426 (-0.05%); split: +0.02%, -0.08%
SMEM: 1533579 -> 1532795 (-0.05%); split: +0.05%, -0.10%
VClause: 231391 -> 231389 (-0.00%); split: -0.01%, +0.00%
SClause: 255352 -> 255294 (-0.02%); split: -0.04%, +0.02%
Copies: 605821 -> 600352 (-0.90%); split: -0.92%, +0.02%
Branches: 133739 -> 133743 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 351092 -> 348048 (-0.87%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405 >
2020-11-03 13:47:40 +00:00
Samuel Pitoiset
3a72021d7c
aco: store NIR range analysis data to the isel context
...
It will be used to optimize some ALU instructions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7405 >
2020-11-03 13:47:40 +00:00
James Park
4bd18e772a
amd/llvm,aco: Replace VLA with alloca
...
MSVC will never support VLA, so use alloca instead.
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7157 >
2020-11-03 07:44:02 +00:00
Samuel Pitoiset
03f260cb27
radv,aco: optimize computing the sample mask for per-sample shading
...
I don't know why these values were introduced for but it seems like
we can optimize this by just doing:
gl_SampleMaskIn[0] = (SampleCoverage & (1 << gl_SampleID))
AMDGPU-PRO and AMDVLK apply the same formula to compute the
sample mask when per-sample shading is enabled.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377 >
2020-11-02 08:05:47 +01:00
Samuel Pitoiset
c63bcda22c
radv,aco: adjust the sample mask only if per-sample shading is enabled
...
When per-sample shading isn't enabled, we can just load the
samplemask from the hardware which is always the coverage of
the entire pixel/fragment.
fossilds-db (VEGA10):
Totals from 131 (0.10% of 136546) affected shaders:
SGPRs: 5056 -> 5048 (-0.16%)
VGPRs: 2600 -> 2372 (-8.77%)
CodeSize: 115788 -> 112560 (-2.79%)
MaxWaves: 1266 -> 1274 (+0.63%)
Instrs: 20620 -> 20071 (-2.66%)
Cycles: 82416 -> 80220 (-2.66%)
VMEM: 51567 -> 35532 (-31.10%); split: +0.24%, -31.34%
SMEM: 8952 -> 8258 (-7.75%); split: +0.11%, -7.86%
SClause: 1223 -> 1199 (-1.96%); split: -2.62%, +0.65%
Copies: 1247 -> 1124 (-9.86%); split: -10.18%, +0.32%
PreVGPRs: 2112 -> 1981 (-6.20%)
Helps Britannia, Shadow of the Tomb Raider, Warhammer II and Control.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377 >
2020-11-02 08:05:43 +01:00
Daniel Schürmann
f4c090a3b3
aco: refactor split_store_data() to always split into evenly sized elements
...
This fixes a couple of issues on GFX67 and
has no negative impact on newer hardware
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7105 >
2020-10-29 14:32:59 +00:00
Timur Kristóf
09b9e52c0d
aco/ngg: Export a zero-area triangle when primitive count is 0.
...
This is a workaround for a bug in Navi 1x NGG HW.
Very rarely, the Navi 1x PA can hang when an NGG workgroup exports
0 total primitives. According to AMD, we always need this workaround
when it is possible that the number of primitives is 0.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:55:47 +01:00
Timur Kristóf
b6654adc0e
aco: Make emitting reduction instructions a bit more convenient.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:47:22 +01:00
Timur Kristóf
260f9c503a
aco/ngg: Put shader query reduction operand into a VGPR.
...
The p_reduce instruction only works if this operand is in a VGPR,
and otherwise gets lowered to incorrect code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:47:22 +01:00
Timur Kristóf
9757c3cb6b
aco: Assert that workgroup barriers are not used inappropriately.
...
Example:
It is possible for some NGG GS waves to have 0 ES and/or GS invocations,
and in that case having an s_barrier inside divergent control flow can
very possibly hang the GPU.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232 >
2020-10-28 21:47:19 +01:00
Rhys Perry
483657de32
aco: use mubuf helper in select_gs_copy_shader
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103 >
2020-10-28 14:59:49 +00:00
Rhys Perry
ec7ecfe9cb
aco: use control flow creation helpers in select_gs_copy_shader
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6103 >
2020-10-28 14:59:49 +00:00
Daniel Schürmann
543f50789a
aco: implement nir_op_unpack_[64/32]_*
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527 >
2020-10-28 10:14:26 +00:00
Rhys Perry
26e53e3afa
aco: ignore the ACO-inserted continue in create_continue_phis()
...
Otherwise, for loops without continue_or_break, create_continue_phis()
always returns an undef operand.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 638cbc21a1 ("aco: handle when ACO adds new continue edges")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2848
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7148 >
2020-10-27 19:53:38 +00:00
Rhys Perry
437995bb70
aco: remove all-undef phi opt
...
This doesn't look like it would create correct IR for 8/16-bit phis and
doesn't seem to help anything. If we ever want to do this, it's probably
better done in nir_opt_remove_phis().
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216 >
2020-10-27 15:24:38 +00:00
Rhys Perry
d20a752c0d
aco: use Builder::copy more
...
fossil-db (Navi):
Totals from 6973 (5.07% of 137413) affected shaders:
SGPRs: 381768 -> 381776 (+0.00%)
VGPRs: 306092 -> 306096 (+0.00%); split: -0.00%, +0.00%
CodeSize: 24440844 -> 24421196 (-0.08%); split: -0.09%, +0.01%
MaxWaves: 86581 -> 86583 (+0.00%)
Instrs: 4682161 -> 4679578 (-0.06%); split: -0.06%, +0.00%
Cycles: 68793116 -> 68261648 (-0.77%); split: -0.83%, +0.05%
fossil-db (Polaris):
Totals from 8154 (5.87% of 138881) affected shaders:
VGPRs: 338916 -> 338920 (+0.00%); split: -0.00%, +0.00%
CodeSize: 23540428 -> 23540488 (+0.00%); split: -0.00%, +0.00%
MaxWaves: 49090 -> 49091 (+0.00%)
Instrs: 4576085 -> 4576101 (+0.00%); split: -0.00%, +0.00%
Cycles: 51720704 -> 51720888 (+0.00%); split: -0.00%, +0.00%
Most of the Navi cycle/instruction changes are from 8/16-bit parallel-rdp
shaders. They appear to be improved because the p_create_vector from
lower_subdword_phis() was blocking constant propagation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216 >
2020-10-27 15:24:38 +00:00
Rhys Perry
72b307a338
aco: don't do divergent break+discard
...
If the shader does:
loop {
if (divergent)
discard
else
a()
b()
}
then a()'s block will dominate b()'s block in the logical CFG, but not the
linear CFG. This will cause value numbering to try to combine SLAU from
a() and b().
This didn't happen with break/continue because sanitize_if() would move
a() out of the branch. Using sanitize_if() to fix this doesn't look easy,
because discards are not control flow instructions in NIR.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216 >
2020-10-27 15:24:38 +00:00
Rhys Perry
27ce5d921e
aco: remove isel_context::allocated
...
Now that we have Program::temp_rc, we can replace it with the first
temporary id allocated for NIR's ssa defs.
No fossil-db changes on Navi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7067 >
2020-10-26 15:14:32 +00:00
Samuel Pitoiset
4e2fe34aa9
aco: fix determining if LOD is zero for nir_texop_txf/nir_texop_txs
...
txf/txs expects LOD to be a 32-bit unsigned integer while other
texture operations expects a float.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3668
Fixes: 93c8ebfa78 ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7256 >
2020-10-22 11:30:43 +00:00
Samuel Pitoiset
eb6877d3af
radv,aco: fix use of texop_samples_identical in the resolve meta path
...
The return value of this texture intrinsic should be a NIR 1-bit bool.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7236 >
2020-10-21 13:06:53 +02:00
Tony Wasserka
fd038132de
aco/isel: Miscellaneous cleanups using the new Stage API
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094 >
2020-10-21 09:49:38 +00:00
Tony Wasserka
34bc9477de
aco: Clean up symbol names and comments related to NGG
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094 >
2020-10-21 09:49:38 +00:00
Tony Wasserka
86c227c10c
aco: Use strong typing to model SW<->HW stage mappings
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7094 >
2020-10-21 09:49:38 +00:00
Bas Nieuwenhuizen
76421667ec
aco: Add VK_KHR_shader_terminate_invocation support.
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7226 >
2020-10-20 22:53:08 +00:00