mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-04-18 12:30:47 +02:00
intel/brw: Allow SIMD16 F and HF type conversion moves
On DG2, the lowering generated for these MOV instructions is
**awful**. The original SIMD16 MOV
{ 18} 67: mov(16) vgrf54+0.0:HF, vgrf46+0.0:F NoMask group0
is lowered to SIMD8 MOVs:
{ 18} 118: mov(8) vgrf54+0.0:HF, vgrf46+0.0:F NoMask group0
{ 18} 119: mov(8) vgrf54+0.16:HF, vgrf46+1.0:F NoMask group8
These MOVs violate Gfx12.5 region restrictions, so these are further
lowered:
{ 17} 119: mov(8) vgrf83<2>:HF, vgrf46+0.0:F NoMask group0
{ 19} 120: mov(8) vgrf54+0.0:UW, vgrf83<2>:UW NoMask group0
{ 19} 122: mov(8) vgrf84<2>:HF, vgrf46+1.0:F NoMask group8
{ 19} 123: mov(8) vgrf54+0.16:UW, vgrf84<2>:UW NoMask group8
The shader-db and fossil-db results are nothing to get excited
about. However, the affect on vk_cooperative_matrix_perf is substantial. In one subtest
shader: shaders/shmemfp16.spv
cooperativeMatrixProps = 8x8x16 A = float16_t B = float16_t C = float16_t D = float16_t scope = subgroup
TILE_M=128 TILE_N=128, TILE_K=32 BLayout=0
performance on my DG2 improved by ~60% due to a MASSIVE reduction in spills and fills:
-Native code for unnamed compute shader (null) (src_hash 0x00000000) (sha1 c6a41b1c4e7aa2da327a39a70ed36c822a4b172f)
-SIMD32 shader: 32484 instructions. 1 loops. 1893868 cycles. 737:1820 spills:fills, 442 sends, scheduled with mode none. Promoted 1 constants. Compacted 519744 to 492224 bytes (5%)
- START B0 (20782 cycles)
+Native code for unnamed compute shader (null) (src_hash 0x00000000) (sha1 621e960daad5b5579b176717f24a315e7ea560a1)
+SIMD32 shader: 23918 instructions. 1 loops. 1089894 cycles. 432:1166 spills:fills, 442 sends, scheduled with mode none. Promoted 1 constants. Compacted 382688 to 353232 bytes (8%)
shader-db:
All Gfx9 and later platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19656270 -> 19653981 (-0.01%)
instructions in affected programs: 61810 -> 59521 (-3.70%)
helped: 116 / HURT: 0
total cycles in shared programs: 823368888 -> 823375854 (<.01%)
cycles in affected programs: 1165284 -> 1172250 (0.60%)
helped: 51 / HURT: 57
fossil-db:
DG2 and Meteor Lake had similar results. (Meteor Lake shown)
*** Shaders only in 'before' results are ignored:
fossil-db/steam-dxvk/total_war_warhammer3/2a3ed2ca632a7cb7/fs.32,
fossil-db/steam-dxvk/total_war_warhammer3/18b9d4a3b1961616/fs.32,
fossil-db/steam-dxvk/total_war_warhammer3/04ac9f3146a6db19/fs.32,
fossil-db/steam-dxvk/total_war_warhammer3/f37ebec6aa1b379a/fs.32,
fossil-db/steam-dxvk/total_war_warhammer3/255c987feb0d4310/fs.32, and 25
more
from 1 apps: fossil-db/steam-dxvk/total_war_warhammer3
Totals:
Instrs: 160946537 -> 160928389 (-0.01%); split: -0.01%, +0.00%
Cycles: 14125908620 -> 14125873958 (-0.00%); split: -0.00%, +0.00%
Totals from 1002 (0.15% of 652134) affected shaders:
Instrs: 411261 -> 393113 (-4.41%); split: -4.41%, +0.00%
Cycles: 16676735 -> 16642073 (-0.21%); split: -0.48%, +0.27%
Tiger Lake
Totals:
Instrs: 164511816 -> 164497202 (-0.01%); split: -0.01%, +0.00%
Cycles: 13801675722 -> 13801629397 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 7955168 -> 7955152 (-0.00%)
Send messages: 8544494 -> 8544486 (-0.00%)
Totals from 997 (0.15% of 651454) affected shaders:
Instrs: 460820 -> 446206 (-3.17%); split: -3.17%, +0.00%
Cycles: 16265514 -> 16219189 (-0.28%); split: -0.84%, +0.56%
Subgroup size: 17552 -> 17536 (-0.09%)
Send messages: 26045 -> 26037 (-0.03%)
Ice Lake
Totals:
Instrs: 165504747 -> 165489970 (-0.01%); split: -0.01%, +0.00%
Cycles: 15145244554 -> 15145149627 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 8107032 -> 8107016 (-0.00%)
Send messages: 8598680 -> 8598672 (-0.00%)
Spill count: 45427 -> 45423 (-0.01%)
Fill count: 74749 -> 74747 (-0.00%)
Totals from 1125 (0.17% of 656115) affected shaders:
Instrs: 521676 -> 506899 (-2.83%); split: -2.83%, +0.00%
Cycles: 19555434 -> 19460507 (-0.49%); split: -0.59%, +0.10%
Subgroup size: 21616 -> 21600 (-0.07%)
Send messages: 28623 -> 28615 (-0.03%)
Spill count: 603 -> 599 (-0.66%)
Fill count: 1362 -> 1360 (-0.15%)
Skylake
*** Shaders only in 'after' results are ignored:
fossil-db/steam-native/red_dead_redemption2/cef460b80bad8485/fs.16,
fossil-db/steam-native/red_dead_redemption2/cd5fe081e2e5529d/fs.16
from 1 apps: fossil-db/steam-native/red_dead_redemption2
Totals:
Instrs: 141607617 -> 141593776 (-0.01%); split: -0.01%, +0.00%
Cycles: 14257812441 -> 14257661671 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 7743752 -> 7743736 (-0.00%)
Send messages: 7552728 -> 7552720 (-0.00%)
Spill count: 43660 -> 43661 (+0.00%)
Fill count: 71301 -> 71303 (+0.00%)
Totals from 1017 (0.16% of 636964) affected shaders:
Instrs: 392454 -> 378613 (-3.53%); split: -3.53%, +0.00%
Cycles: 16622974 -> 16472204 (-0.91%); split: -1.04%, +0.13%
Subgroup size: 19840 -> 19824 (-0.08%)
Send messages: 23021 -> 23013 (-0.03%)
Spill count: 484 -> 485 (+0.21%)
Fill count: 1155 -> 1157 (+0.17%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>
This commit is contained in:
parent
66dc6e07f5
commit
cd70e49394
2 changed files with 24 additions and 29 deletions
|
|
@ -1137,7 +1137,8 @@ special_restrictions_for_mixed_float_mode(const struct brw_isa_info *isa,
|
|||
* "No SIMD16 in mixed mode when destination is f32. Instruction
|
||||
* execution size must be no more than 8."
|
||||
*/
|
||||
ERROR_IF(exec_size > 8 && dst_type == BRW_REGISTER_TYPE_F,
|
||||
ERROR_IF(exec_size > 8 && dst_type == BRW_REGISTER_TYPE_F &&
|
||||
opcode != BRW_OPCODE_MOV,
|
||||
"Mixed float mode with 32-bit float destination is limited "
|
||||
"to SIMD8");
|
||||
|
||||
|
|
@ -1212,7 +1213,8 @@ special_restrictions_for_mixed_float_mode(const struct brw_isa_info *isa,
|
|||
* Align1 and Align16."
|
||||
*/
|
||||
ERROR_IF(exec_size > 8 && dst_is_packed &&
|
||||
dst_type == BRW_REGISTER_TYPE_HF,
|
||||
dst_type == BRW_REGISTER_TYPE_HF &&
|
||||
opcode != BRW_OPCODE_MOV,
|
||||
"Align1 mixed float mode is limited to SIMD8 when destination "
|
||||
"is packed half-float");
|
||||
|
||||
|
|
|
|||
|
|
@ -113,34 +113,27 @@ get_fpu_lowered_simd_width(const fs_visitor *shader,
|
|||
if (inst->is_3src(compiler) && !devinfo->supports_simd16_3src)
|
||||
max_width = MIN2(max_width, inst->exec_size / reg_count);
|
||||
|
||||
/* From the SKL PRM, Special Restrictions for Handling Mixed Mode
|
||||
* Float Operations:
|
||||
*
|
||||
* "No SIMD16 in mixed mode when destination is f32. Instruction
|
||||
* execution size must be no more than 8."
|
||||
*
|
||||
* FIXME: the simulator doesn't seem to complain if we don't do this and
|
||||
* empirical testing with existing CTS tests show that they pass just fine
|
||||
* without implementing this, however, since our interpretation of the PRM
|
||||
* is that conversion MOVs between HF and F are still mixed-float
|
||||
* instructions (and therefore subject to this restriction) we decided to
|
||||
* split them to be safe. Might be useful to do additional investigation to
|
||||
* lift the restriction if we can ensure that it is safe though, since these
|
||||
* conversions are common when half-float types are involved since many
|
||||
* instructions do not support HF types and conversions from/to F are
|
||||
* required.
|
||||
*/
|
||||
if (is_mixed_float_with_fp32_dst(inst) && devinfo->ver < 20)
|
||||
max_width = MIN2(max_width, 8);
|
||||
if (inst->opcode != BRW_OPCODE_MOV) {
|
||||
/* From the SKL PRM, Special Restrictions for Handling Mixed Mode
|
||||
* Float Operations:
|
||||
*
|
||||
* "No SIMD16 in mixed mode when destination is f32. Instruction
|
||||
* execution size must be no more than 8."
|
||||
*
|
||||
* Testing indicates that this restriction does not apply to MOVs.
|
||||
*/
|
||||
if (is_mixed_float_with_fp32_dst(inst) && devinfo->ver < 20)
|
||||
max_width = MIN2(max_width, 8);
|
||||
|
||||
/* From the SKL PRM, Special Restrictions for Handling Mixed Mode
|
||||
* Float Operations:
|
||||
*
|
||||
* "No SIMD16 in mixed mode when destination is packed f16 for both
|
||||
* Align1 and Align16."
|
||||
*/
|
||||
if (is_mixed_float_with_packed_fp16_dst(inst) && devinfo->ver < 20)
|
||||
max_width = MIN2(max_width, 8);
|
||||
/* From the SKL PRM, Special Restrictions for Handling Mixed Mode
|
||||
* Float Operations:
|
||||
*
|
||||
* "No SIMD16 in mixed mode when destination is packed f16 for both
|
||||
* Align1 and Align16."
|
||||
*/
|
||||
if (is_mixed_float_with_packed_fp16_dst(inst) && devinfo->ver < 20)
|
||||
max_width = MIN2(max_width, 8);
|
||||
}
|
||||
|
||||
/* Only power-of-two execution sizes are representable in the instruction
|
||||
* control fields.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue