Joshua Ashton
ce7966fcb4
aco: Use movk for AddressHi bits in vertex prolog
...
Eliminates a useless extra dword by transforming
s_mov_b32 s47, 0xffff8000 ; beaf03ff ffff8000
to
s_movk_i32 s47, 0x8000 ; b02f8000
which does the same thing.
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15415 >
2022-05-03 10:15:36 +00:00
Samuel Pitoiset
6873da0e42
aco: fix load_barycentric_at_{sample,offset} on GFX6-7
...
The computation was wrong.
Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.*
with Zink on GFX6 (Pitcairn).
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16099 >
2022-04-25 07:17:39 +00:00
Georg Lehmann
d12b5e7633
aco: Reuse previous -1 result in find_msb to avoid using VOP3.
...
Totals:
CodeSize: 388934388 -> 388933712 (-0.00%)
Totals from 208 (0.15% of 134913) affected shaders:
CodeSize: 2008016 -> 2007340 (-0.03%)
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16011 >
2022-04-19 15:18:58 +00:00
Rhys Perry
c883abda76
aco: implement load_shared2_amd/store_shared2_amd
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13778 >
2022-04-13 23:08:07 +00:00
Daniel Schürmann
d703a0e808
aco: remove register hints entirely
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408 >
2022-04-13 21:52:43 +00:00
Daniel Schürmann
2fe005a3fe
aco: remove occurences of VCC hint
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408 >
2022-04-13 21:52:43 +00:00
Rhys Perry
7478b00c7c
aco: remove old global access intrinsics
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124 >
2022-04-13 16:23:35 +00:00
Rhys Perry
9d1bab3615
aco: increase global_load_params.max_const_offset_plus_one
...
The callback now supports this. This shouldn't have any effect yet except
on GFX6 with 12 byte loads.
fossil-db (Pitcairn):
Totals from 246 (0.18% of 135668) affected shaders:
VGPRs: 14684 -> 14768 (+0.57%); split: -0.44%, +1.01%
CodeSize: 1765792 -> 1738040 (-1.57%)
Instrs: 344605 -> 340055 (-1.32%)
Latency: 4892904 -> 4861942 (-0.63%)
InvThroughput: 2479599 -> 2446070 (-1.35%)
VClause: 8782 -> 8735 (-0.54%)
SClause: 9854 -> 9853 (-0.01%)
Copies: 47327 -> 45401 (-4.07%); split: -4.08%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124 >
2022-04-13 16:23:35 +00:00
Rhys Perry
3e9517c757
aco: implement _amd global access intrinsics
...
fossil-db (Sienna Cichlid):
Totals from 7 (0.01% of 134621) affected shaders:
VGPRs: 760 -> 776 (+2.11%)
CodeSize: 222000 -> 222044 (+0.02%); split: -0.01%, +0.03%
Instrs: 40959 -> 40987 (+0.07%); split: -0.01%, +0.08%
Latency: 874811 -> 886609 (+1.35%); split: -0.00%, +1.35%
InvThroughput: 437405 -> 443303 (+1.35%); split: -0.00%, +1.35%
VClause: 1242 -> 1240 (-0.16%)
SClause: 1050 -> 1049 (-0.10%); split: -0.19%, +0.10%
Copies: 4953 -> 4973 (+0.40%); split: -0.04%, +0.44%
Branches: 1947 -> 1957 (+0.51%); split: -0.05%, +0.56%
PreVGPRs: 741 -> 747 (+0.81%)
fossil-db changes seem to be noise.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124 >
2022-04-13 16:23:35 +00:00
Rhys Perry
391bf3ea30
aco: don't expand smem/mubuf global loads
...
For example, dwordx3->dwordx4 or ubyte3->dwordx2.
Global loads don't have the bounds checking that buffer loads have that
makes this safe.
The alignment checks are added to global_load_callback() in case
byte_align_loads=false, align=1 and bytes_needed=3. Without them, the
callback will create a dword load.
fossil-db (Sienna Cichlid):
Totals from 267 (0.20% of 134621) affected shaders:
CodeSize: 1603352 -> 1606568 (+0.20%)
Instrs: 294946 -> 295482 (+0.18%); split: -0.00%, +0.18%
Latency: 2997003 -> 2997052 (+0.00%); split: -0.02%, +0.02%
InvThroughput: 526645 -> 526659 (+0.00%)
SClause: 9179 -> 9185 (+0.07%); split: -0.02%, +0.09%
Copies: 25363 -> 25375 (+0.05%); split: -0.08%, +0.13%
Branches: 8298 -> 8299 (+0.01%)
fossil-db (Polaris10):
Totals from 267 (0.20% of 135668) affected shaders:
CodeSize: 1636672 -> 1638756 (+0.13%); split: -0.00%, +0.13%
Instrs: 308484 -> 308733 (+0.08%); split: -0.01%, +0.09%
Latency: 3446045 -> 3446904 (+0.02%); split: -0.00%, +0.03%
InvThroughput: 1206722 -> 1206828 (+0.01%); split: -0.00%, +0.01%
SClause: 9308 -> 9311 (+0.03%); split: -0.08%, +0.11%
Copies: 36933 -> 36921 (-0.03%); split: -0.08%, +0.05%
fossil-db (Pitcairn):
Totals from 275 (0.20% of 135668) affected shaders:
SGPRs: 17616 -> 17520 (-0.54%); split: -0.64%, +0.09%
VGPRs: 15428 -> 15540 (+0.73%); split: -0.23%, +0.96%
CodeSize: 1885792 -> 1929120 (+2.30%); split: -0.00%, +2.30%
MaxWaves: 1284 -> 1285 (+0.08%)
Instrs: 368963 -> 376095 (+1.93%); split: -0.00%, +1.94%
Latency: 5122922 -> 5168398 (+0.89%); split: -0.01%, +0.90%
InvThroughput: 2562866 -> 2604279 (+1.62%)
VClause: 9268 -> 9296 (+0.30%); split: -0.13%, +0.43%
SClause: 10702 -> 10705 (+0.03%); split: -0.05%, +0.07%
Copies: 48620 -> 50629 (+4.13%); split: -0.08%, +4.21%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124 >
2022-04-13 16:23:35 +00:00
Rhys Perry
6baad09711
aco: use saddr for global access with sgpr address
...
fossil-db (Sienna Cichlid):
Totals from 38 (0.03% of 134621) affected shaders:
CodeSize: 237196 -> 237060 (-0.06%); split: -0.09%, +0.03%
Instrs: 43895 -> 43894 (-0.00%); split: -0.02%, +0.01%
Latency: 914633 -> 916263 (+0.18%); split: -0.01%, +0.19%
InvThroughput: 468215 -> 468971 (+0.16%); split: -0.02%, +0.18%
SClause: 1239 -> 1242 (+0.24%)
PreSGPRs: 997 -> 1003 (+0.60%)
PreVGPRs: 936 -> 923 (-1.39%); split: -1.50%, +0.11%
Regression seems to be RA noise, creating a waitcnt.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124 >
2022-04-13 16:23:35 +00:00
Timur Kristóf
b874a2ed61
aco: Fix VOP2 instruction format in visit_tex.
...
There was a v_or_b32 that accidentally used SOP2.
It should use VOP2.
Issue found by looking at a gfxreconstruct trace posted by a user
in this bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5838
Cc: mesa-stable
Fixes: 93c8ebfa78 "aco: Initial commit of independent AMD compiler"
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15923 >
2022-04-13 13:06:49 +00:00
Rhys Perry
773c7cbcbc
radv,aco: implement 64-bit inline push constants
...
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 1932 -> 1560 (-19.25%)
Instrs: 357 -> 303 (-15.13%)
Latency: 6576 -> 5883 (-10.54%)
InvThroughput: 26304 -> 23532 (-10.54%)
SClause: 42 -> 24 (-42.86%)
Copies: 90 -> 105 (+16.67%); split: -10.00%, +26.67%
PreSGPRs: 144 -> 201 (+39.58%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145 >
2022-04-12 11:44:30 +00:00
Rhys Perry
7f6262bb85
radv: allow holes in inline push constants
...
Use a dword mask instead of a range to track which push constants to
inline.
fossil-db (Sienna Cichlid):
Totals from 5724 (4.25% of 134621) affected shaders:
CodeSize: 20894044 -> 20815748 (-0.37%); split: -0.39%, +0.02%
Instrs: 4002568 -> 3988385 (-0.35%); split: -0.38%, +0.02%
Latency: 29285060 -> 29224414 (-0.21%); split: -0.22%, +0.01%
InvThroughput: 5529700 -> 5526893 (-0.05%); split: -0.05%, +0.00%
VClause: 78093 -> 78240 (+0.19%); split: -0.23%, +0.41%
SClause: 135495 -> 131027 (-3.30%); split: -3.30%, +0.00%
Copies: 330856 -> 324552 (-1.91%); split: -2.37%, +0.46%
PreSGPRs: 226031 -> 224778 (-0.55%); split: -0.61%, +0.05%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145 >
2022-04-12 11:44:30 +00:00
Samuel Pitoiset
d32656bc65
radv: lower has_multiview_view_index in NIR
...
This lowering is done in a new NIR pass where the layer is written
before emit_vertex_with_counter for geometry shaders and after the
position for other vertex stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15456 >
2022-03-31 07:34:21 +00:00
Georg Lehmann
141ca78634
radv, aco: Packed iadd_sat/uadd_sat.
...
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421 >
2022-03-28 20:02:52 +00:00
Georg Lehmann
50f585254c
aco: Implement scalar iadd_sat.
...
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421 >
2022-03-28 20:02:52 +00:00
Georg Lehmann
989f2b1785
aco: Implement 64bit uadd_sat.
...
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421 >
2022-03-28 20:02:52 +00:00
Georg Lehmann
01fad6aa72
aco: Remove 0 data components from image stores.
...
Image stores always write a full texel.
The hardware writes zero for components not included in dmask.
Totals from 387 (0.29% of 134913) affected shaders: (GFX10.3)
VGPRs: 17216 -> 17136 (-0.46%)
CodeSize: 1987652 -> 1981504 (-0.31%)
MaxWaves: 9054 -> 9058 (+0.04%)
Instrs: 361883 -> 361115 (-0.21%); split: -0.21%, +0.00%
Latency: 5383187 -> 5381227 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 1373830 -> 1372097 (-0.13%)
VClause: 9031 -> 9038 (+0.08%); split: -0.04%, +0.12%
Copies: 25923 -> 25153 (-2.97%); split: -2.98%, +0.01%
PreVGPRs: 14643 -> 14555 (-0.60%)
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14986 >
2022-03-28 19:43:18 +00:00
Samuel Pitoiset
4cfb5332d6
radv: lower adjusting gl_FragCoord.z for VRS in NIR
...
fossils-db (Sienna Cichlid):
Totals from 4432 (3.29% of 134913) affected shaders:
VGPRs: 231232 -> 231880 (+0.28%)
CodeSize: 24738224 -> 24718008 (-0.08%); split: -0.08%, +0.00%
MaxWaves: 93120 -> 93000 (-0.13%)
Instrs: 4540970 -> 4541062 (+0.00%); split: -0.01%, +0.01%
Latency: 49658353 -> 49641444 (-0.03%); split: -0.05%, +0.01%
InvThroughput: 9604328 -> 9603041 (-0.01%); split: -0.02%, +0.01%
VClause: 66497 -> 66498 (+0.00%)
SClause: 209530 -> 209532 (+0.00%); split: -0.01%, +0.01%
Copies: 276135 -> 276249 (+0.04%); split: -0.14%, +0.18%
PreSGPRs: 189409 -> 189415 (+0.00%)
PreVGPRs: 207368 -> 207458 (+0.04%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15450 >
2022-03-28 16:17:37 +02:00
Samuel Pitoiset
a42b6a4d39
radv: lower load_sample_mask_in in NIR
...
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15450 >
2022-03-28 16:17:34 +02:00
Samuel Pitoiset
8c51874af4
radv,aco: lower color exports in NIR
...
fossils-db (Sienna Cichlid):
Totals from 27108 (20.09% of 134913) affected shaders:
VGPRs: 1260608 -> 1261424 (+0.06%); split: -0.00%, +0.07%
CodeSize: 112795868 -> 112785892 (-0.01%); split: -0.05%, +0.04%
MaxWaves: 628608 -> 628448 (-0.03%); split: +0.00%, -0.03%
Instrs: 20750003 -> 20749314 (-0.00%); split: -0.01%, +0.00%
Latency: 288088081 -> 288015865 (-0.03%); split: -0.06%, +0.04%
InvThroughput: 53944847 -> 53961693 (+0.03%); split: -0.01%, +0.04%
VClause: 396463 -> 396467 (+0.00%); split: -0.02%, +0.02%
SClause: 842088 -> 842150 (+0.01%); split: -0.03%, +0.04%
Copies: 1244982 -> 1259026 (+1.13%); split: -0.01%, +1.14%
PreSGPRs: 1251949 -> 1251909 (-0.00%)
PreVGPRs: 1099647 -> 1100879 (+0.11%); split: -0.03%, +0.14%
fossils-db (Polaris10):
Totals from 23928 (17.60% of 135960) affected shaders:
SGPRs: 1751792 -> 1751024 (-0.04%); split: -0.05%, +0.01%
VGPRs: 1098964 -> 1098556 (-0.04%); split: -0.13%, +0.09%
CodeSize: 99893472 -> 99837940 (-0.06%); split: -0.06%, +0.00%
MaxWaves: 138322 -> 138306 (-0.01%); split: +0.03%, -0.04%
Instrs: 19213995 -> 19211980 (-0.01%); split: -0.02%, +0.01%
Latency: 273026926 -> 273109402 (+0.03%); split: -0.01%, +0.04%
InvThroughput: 111160907 -> 111195187 (+0.03%); split: -0.04%, +0.07%
VClause: 343058 -> 343097 (+0.01%); split: -0.02%, +0.03%
SClause: 802756 -> 802884 (+0.02%); split: -0.04%, +0.06%
Copies: 1729387 -> 1739208 (+0.57%); split: -0.04%, +0.61%
PreSGPRs: 1090264 -> 1090303 (+0.00%); split: -0.00%, +0.01%
PreVGPRs: 959490 -> 960600 (+0.12%); split: -0.04%, +0.15%
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15263 >
2022-03-28 11:47:35 +00:00
Rhys Perry
543a5487cf
radv,aco: lower image descriptor loads in NIR
...
fossil-db (Sienna Cichlid):
Totals from 2926 (1.80% of 162293) affected shaders:
Instrs: 2315110 -> 2306644 (-0.37%); split: -0.37%, +0.00%
CodeSize: 12581592 -> 12546588 (-0.28%); split: -0.28%, +0.00%
VGPRs: 130216 -> 130208 (-0.01%)
SpillSGPRs: 477 -> 474 (-0.63%); split: -5.03%, +4.40%
Latency: 29686188 -> 29678804 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 6926545 -> 6926286 (-0.00%); split: -0.02%, +0.02%
SClause: 73761 -> 72996 (-1.04%); split: -1.16%, +0.12%
Copies: 144068 -> 137279 (-4.71%); split: -4.78%, +0.07%
Branches: 47466 -> 47483 (+0.04%); split: -0.01%, +0.04%
PreSGPRs: 118042 -> 117377 (-0.56%); split: -1.34%, +0.77%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773 >
2022-03-22 16:33:27 +00:00
Rhys Perry
15640e58d9
radv,aco: lower texture descriptor loads in NIR
...
fossil-db (Sienna Cichlid):
Totals from 39445 (24.30% of 162293) affected shaders:
MaxWaves: 875988 -> 875972 (-0.00%)
Instrs: 35372561 -> 35234909 (-0.39%); split: -0.41%, +0.03%
CodeSize: 190237480 -> 189379240 (-0.45%); split: -0.47%, +0.02%
VGPRs: 1889856 -> 1889928 (+0.00%); split: -0.00%, +0.01%
SpillSGPRs: 10764 -> 10857 (+0.86%); split: -2.04%, +2.91%
SpillVGPRs: 1891 -> 1907 (+0.85%); split: -0.32%, +1.16%
Scratch: 260096 -> 261120 (+0.39%)
Latency: 477701150 -> 477578466 (-0.03%); split: -0.06%, +0.03%
InvThroughput: 87819847 -> 87830346 (+0.01%); split: -0.03%, +0.04%
VClause: 673353 -> 673829 (+0.07%); split: -0.04%, +0.11%
SClause: 1385396 -> 1366478 (-1.37%); split: -1.65%, +0.29%
Copies: 2327965 -> 2229134 (-4.25%); split: -4.58%, +0.34%
Branches: 906707 -> 906434 (-0.03%); split: -0.13%, +0.10%
PreSGPRs: 1874153 -> 1862698 (-0.61%); split: -1.34%, +0.73%
PreVGPRs: 1691382 -> 1691383 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773 >
2022-03-22 16:33:27 +00:00
Rhys Perry
52f850238a
radv,aco: lower buffer descriptor loads in NIR
...
fossil-db (Sienna Cichlid):
Totals from 75420 (46.47% of 162293) affected shaders:
MaxWaves: 1878200 -> 1879228 (+0.05%); split: +0.06%, -0.00%
Instrs: 54021103 -> 54141370 (+0.22%); split: -0.04%, +0.26%
CodeSize: 287813520 -> 288293352 (+0.17%); split: -0.04%, +0.21%
VGPRs: 3267576 -> 3266296 (-0.04%); split: -0.04%, +0.00%
SpillSGPRs: 10445 -> 10904 (+4.39%); split: -0.31%, +4.70%
SpillVGPRs: 1818 -> 1811 (-0.39%); split: -1.05%, +0.66%
Scratch: 955392 -> 954368 (-0.11%)
Latency: 563477854 -> 562131282 (-0.24%); split: -0.31%, +0.08%
InvThroughput: 111860104 -> 111553968 (-0.27%); split: -0.30%, +0.02%
VClause: 958432 -> 961415 (+0.31%); split: -0.34%, +0.65%
SClause: 1917415 -> 1926952 (+0.50%); split: -0.69%, +1.19%
Copies: 3812945 -> 3916758 (+2.72%); split: -0.27%, +2.99%
Branches: 1611235 -> 1612022 (+0.05%); split: -0.04%, +0.08%
PreSGPRs: 3095505 -> 3126580 (+1.00%); split: -0.06%, +1.07%
PreVGPRs: 2773011 -> 2773013 (+0.00%)
Most regressions seem to be because ACO's convert_pointer_to_64_bit()
can't be CSE'd with radv_nir_apply_pipeline_layout()'s
convert_pointer_to_64_bit(). This should be improved by later commits.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773 >
2022-03-22 16:33:27 +00:00
Rhys Perry
22050fcf9d
radv,aco: lower vulkan_resource_index in NIR
...
fossil-db (Sienna Cichlid):
Totals from 31338 (19.31% of 162293) affected shaders:
MaxWaves: 758634 -> 758616 (-0.00%)
Instrs: 26398289 -> 26378282 (-0.08%); split: -0.09%, +0.01%
CodeSize: 141048208 -> 140971060 (-0.05%); split: -0.07%, +0.01%
VGPRs: 1373656 -> 1373736 (+0.01%)
SpillSGPRs: 9944 -> 9924 (-0.20%); split: -0.24%, +0.04%
SpillVGPRs: 1892 -> 1898 (+0.32%); split: -0.95%, +1.27%
Latency: 308570144 -> 308528462 (-0.01%); split: -0.03%, +0.02%
InvThroughput: 57698072 -> 57684901 (-0.02%); split: -0.07%, +0.04%
VClause: 440357 -> 440602 (+0.06%); split: -0.02%, +0.08%
SClause: 974724 -> 973315 (-0.14%); split: -0.18%, +0.04%
Copies: 1944925 -> 1945103 (+0.01%); split: -0.11%, +0.12%
Branches: 799444 -> 799461 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1619860 -> 1619233 (-0.04%); split: -0.05%, +0.02%
PreVGPRs: 1252813 -> 1252863 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773 >
2022-03-22 16:33:27 +00:00
Rhys Perry
e6672f6fd1
radv: move radv_declare_shader_args() out of shader_variant_compile()
...
Declaring them earlier will allow us to access them in NIR.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773 >
2022-03-22 16:33:27 +00:00
Rhys Perry
53996cf979
aco: implement load_{scalar,vector}_arg_amd and load_smem_amd
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773 >
2022-03-22 16:33:27 +00:00
Rhys Perry
79c8740c6e
aco: fix fp16 opcode definitions
...
The v_fma_mix optimizations assume v_cvt_f16_f32 and v_mul_f16 use a v2b
definition.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769 >
2022-03-17 19:04:17 +00:00
Rhys Perry
c4cf92cad7
radv,aco,ac/llvm: fix indirect dispatches on the compute queue on GFX7-10
...
Since neither PKT3_LOAD_SH_REG_INDEX nor PKT3_COPY_DATA work with compute
queues on GFX7-10, we have to load the dispatch size from memory in the
shader.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15064 >
2022-03-14 19:54:36 +00:00
Rhys Perry
973967c49d
aco: split and recombine unaligned sgpr inputs
...
An example is the num_work_groups argument. Fixes invalid assembly with
func.compute.num-workgroups.basic.q0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15064 >
2022-03-14 19:54:36 +00:00
Samuel Pitoiset
6c1c9067d9
aco: always emit vk_cvt_pkrtz_f16_f32 for nir_op_pack_half_2x16_split
...
From the VK_KHR_shader_float_controls extension:
"5) Do any of the “Pack” GLSL.std.450 instructions count as
conversion instructions and have the rounding mode applied?"
"RESOLVED: No, only instructions listed in “section 3.32.11.
Conversion Instructions” of the SPIR-V specification count as
conversion instructions."
This is also the same logic as the LLVM backend.
No fossils-db changes on Sienna Cichlid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15301 >
2022-03-09 16:24:20 +00:00
Samuel Pitoiset
342e6f8332
radv,aco,llvm: lower post shuffle vertex in NIR
...
fossils-db (Sienna Cichlid):
Totals from 774 (0.57% of 134913) affected shaders:
VGPRs: 26496 -> 26312 (-0.69%)
CodeSize: 1825936 -> 1828812 (+0.16%); split: -0.04%, +0.20%
MaxWaves: 22046 -> 22062 (+0.07%)
Instrs: 347634 -> 347975 (+0.10%); split: -0.05%, +0.15%
Latency: 1363949 -> 1356426 (-0.55%); split: -0.59%, +0.04%
InvThroughput: 221529 -> 221380 (-0.07%); split: -0.10%, +0.04%
VClause: 5682 -> 5676 (-0.11%); split: -1.46%, +1.36%
SClause: 7485 -> 7411 (-0.99%); split: -1.48%, +0.49%
Copies: 30481 -> 30420 (-0.20%); split: -0.51%, +0.31%
PreVGPRs: 19717 -> 19656 (-0.31%)
fossil-db (Polaris10):
Totals from 896 (0.66% of 135960) affected shaders:
SGPRs: 49824 -> 49648 (-0.35%); split: -0.39%, +0.03%
VGPRs: 31040 -> 29948 (-3.52%); split: -3.62%, +0.10%
CodeSize: 875960 -> 875920 (-0.00%); split: -0.06%, +0.05%
MaxWaves: 6380 -> 6429 (+0.77%)
Instrs: 171522 -> 171482 (-0.02%); split: -0.07%, +0.05%
Latency: 1356082 -> 1334386 (-1.60%); split: -1.61%, +0.01%
InvThroughput: 553389 -> 552957 (-0.08%); split: -0.08%, +0.00%
VClause: 4317 -> 4244 (-1.69%); split: -2.41%, +0.72%
SClause: 6157 -> 6139 (-0.29%); split: -0.45%, +0.16%
Copies: 9340 -> 9235 (-1.12%); split: -1.24%, +0.12%
PreVGPRs: 22366 -> 22116 (-1.12%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15113 >
2022-03-08 19:18:01 +00:00
Samuel Pitoiset
9b113f1b6c
aco: implement nir_op_pack_{uint,sint}_2x16
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15231 >
2022-03-04 08:06:56 +00:00
Rhys Perry
ceca5e68c4
aco: remove vcc hint from branch definitions
...
This doesn't seem to have much benefit anymore.
fossil-db (Sienna Cichlid):
Totals from 198 (0.15% of 134913) affected shaders:
CodeSize: 2610536 -> 2610872 (+0.01%); split: -0.01%, +0.02%
Instrs: 479001 -> 479085 (+0.02%); split: -0.01%, +0.03%
Latency: 7310684 -> 7300735 (-0.14%); split: -0.16%, +0.02%
InvThroughput: 2439084 -> 2437446 (-0.07%); split: -0.07%, +0.00%
SClause: 14760 -> 14722 (-0.26%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432 >
2022-03-03 20:21:08 +00:00
Samuel Pitoiset
516aee64cc
radv,aco: do not lower nir_op_pack_{unorm,snorm}_2x16
...
v_cvt_pknorm_{u16,i16}_f32 can be emitted instead, it's supported on
all generations.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15215 >
2022-03-03 14:54:12 +01:00
Timur Kristóf
11957d3863
aco: Remove superfluous code for mesh shader workgroup ID.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15199 >
2022-03-01 15:37:12 +00:00
Timur Kristóf
1ca6b2f216
aco: Support memory modes properly with load/store_buffer_amd.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15161 >
2022-02-25 14:08:39 +01:00
Timur Kristóf
ba4b48e787
aco: Support task_payload with barriers, refactor allowed storage class.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15161 >
2022-02-25 14:08:36 +01:00
Timur Kristóf
9cc9cf77a8
aco: Fix multiview view index for mesh shaders.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034 >
2022-02-25 06:31:33 +00:00
Timur Kristóf
082b691141
aco: Fix workgroup_id.y and .z for NV_mesh_shader.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034 >
2022-02-25 06:31:33 +00:00
Timur Kristóf
10ebfb3bf2
aco: Allow 1-byte loads and stores with load/store_buffer_amd
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034 >
2022-02-25 06:31:33 +00:00
Samuel Pitoiset
369b8cffea
radv,aco,llvm: lower adjusting vertex alpha in NIR
...
Instead of duplicating the same lowering in both compiler backends.
This pass will be used to do more VS input lowering.
fossils-db (Polaris10):
Totals from 48 (0.04% of 135960) affected shaders:
VGPRs: 1692 -> 1684 (-0.47%)
CodeSize: 54016 -> 53964 (-0.10%); split: -0.11%, +0.01%
MaxWaves: 339 -> 341 (+0.59%)
Instrs: 11260 -> 11247 (-0.12%); split: -0.13%, +0.02%
Latency: 88165 -> 88113 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 36153 -> 36093 (-0.17%)
Copies: 583 -> 568 (-2.57%)
fossils-db (Pitcairn):
Totals from 43 (0.03% of 135960) affected shaders:
VGPRs: 1548 -> 1552 (+0.26%)
CodeSize: 47900 -> 47820 (-0.17%)
Instrs: 10751 -> 10731 (-0.19%)
Latency: 83029 -> 82873 (-0.19%)
VClause: 168 -> 164 (-2.38%)
SClause: 393 -> 391 (-0.51%)
Copies: 705 -> 685 (-2.84%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15076 >
2022-02-22 08:08:42 +00:00
Samuel Pitoiset
cbd5724a6d
aco: implement nir_intrinsic_load_vrs_rates_amd
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713 >
2022-02-16 08:11:19 +01:00
Daniel Schürmann
a5a40beefa
aco: don't emit WQM for bool_to_scalar_condition
...
This was only necessary to ensure that the source is computed in WQM
if the current exec mask is in WQM.
Totals from 23170 (17.17% of 134913) affected shaders: (GFX10.3)
VGPRs: 1384464 -> 1383400 (-0.08%); split: -0.08%, +0.01%
SpillSGPRs: 7575 -> 7574 (-0.01%)
CodeSize: 146444752 -> 146317104 (-0.09%); split: -0.13%, +0.04%
MaxWaves: 429870 -> 429868 (-0.00%)
Instrs: 27202586 -> 27170316 (-0.12%); split: -0.17%, +0.05%
Latency: 379488313 -> 379335412 (-0.04%); split: -0.07%, +0.03%
InvThroughput: 69500561 -> 69487704 (-0.02%); split: -0.03%, +0.01%
VClause: 473080 -> 473038 (-0.01%); split: -0.02%, +0.01%
SClause: 1080576 -> 1080571 (-0.00%); split: -0.06%, +0.06%
Copies: 1492865 -> 1504604 (+0.79%); split: -0.40%, +1.19%
Branches: 711295 -> 716849 (+0.78%); split: -0.01%, +0.79%
PreSGPRs: 1331362 -> 1328402 (-0.22%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Samuel Pitoiset
2451290bc4
radv: rewrite RADV_FORCE_VRS directly in NIR
...
This introduces a small NIR pass that exports
VARYING_SLOT_PRIMITIVE_SHADING_RATE if RADV_FORCE_VRS is used,
instead of doing this in both backend compilers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14907 >
2022-02-09 17:40:34 +01:00
Daniel Schürmann
13c3137960
aco: merge block_kind_uses_[demote|discard_if]
...
These serve the same purpose. The new name is
block_kind_uses_discard.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Daniel Schürmann
08b8500dfb
aco: remove block_kind_discard
...
This case doesn't seem to happen in practice.
No need to micro-optimize it.
This patch merges instruction selection for discard/discard_if.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Daniel Schürmann
b67092e685
aco: emit nir_intrinsic_discard() as p_discard_if()
...
This simplifies the code and emits a slightly better
sequence in some cases.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Rhys Perry
e7f91b194a
radv,aco,ac/llvm: implement fmulz and ffmaz
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436 >
2022-01-20 22:54:42 +00:00