vk_clock_gettime hasn't been used by other implementations ever since
venus and kk migrated over to the common implementation. It'd be better
to drop that helper (or move into anv) because it's not OS agnostic as
compare to the more comprehensive vk_device_get_timestamp.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40582>
This change adds win32 VK_TIME_DOMAIN_QUERY_PERFORMANCE_COUNTER_KHR
support to vk_device_get_timestamp. Meanwhile, vk_clock_gettime is left
untouched preparing for deprecation (anv is the only user). The latter
also only has the host clock part and doesn't handle error cases in a
robust manner.
v2 (zzyiwei):
- vk_device_get_timestamp updates
- use DETECT_OS_WINDOWS
- add commit messages
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40582>
One frustrating thing about the CMP and CMPN instructions is that they
always write the flags. Sometimes, however, it is desirable to generate
the comparison result without modifying the flags. This would,
theoretically, reduce false dependencies that restrict the scheduler's
ability to rearrange code, create more opportunities for cmod
propagation, save a kitten from a tree, and make a rainbow.
Consider this sequence:
cmp.ge.f0.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F
cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D
(+f0.0) if(8) JIP: LABEL19 UIP: LABEL19
It would be advantageous to put the first CMP between the second CMP and
the IF, but this cannot be done since the IF depends on the flags generated
by the second CMP.
This pass enables this rescheduling by changing the first CMP to write
to a different flags register.
cmp.ge.f1.0(8) g103<1>F g101<8,8,1>F g39<8,8,1>F
cmp.nz.f0.0(8) null<1>D g81<8,8,1>D 0D
(+f0.0) if(8) JIP: LABEL19 UIP: LABEL19
Sometimes this is also possible by using a different instruction. For
example, consider
cmp.l.f0.0(8) g103<1>D g101<8,8,1>D 0D
This produces 0xffffffff when g101 negative and zero otherwise. This
instruction, which does not modifiy the flag, also produces these results:
asr(8) g103<1>D g101<8,8,1>D 31D
Gfx9 platforms take a hit on instructions due to the instruction added
at the end of short shaders by brw_workaround_source_arf_before_eot.
shader-db:
Lunar Lake, Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 17089451 -> 17088766 (<.01%)
instructions in affected programs: 766613 -> 765928 (-0.09%)
helped: 653 / HURT: 0
total cycles in shared programs: 888832986 -> 887873068 (-0.11%)
cycles in affected programs: 549441852 -> 548481934 (-0.17%)
helped: 10474 / HURT: 130
LOST: 9
GAINED: 0
Skylake
total instructions in shared programs: 19037976 -> 19049719 (0.06%)
instructions in affected programs: 3979914 -> 3991657 (0.30%)
helped: 503 / HURT: 12303
total cycles in shared programs: 867918242 -> 866930801 (-0.11%)
cycles in affected programs: 512773919 -> 511786478 (-0.19%)
helped: 13858 / HURT: 66
LOST: 32
GAINED: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 925023504 -> 924950382 (-0.01%); split: -0.01%, +0.00%
Cycle count: 106348432916 -> 106116809009 (-0.22%); split: -0.22%, +0.00%
Spill count: 3423988 -> 3423930 (-0.00%); split: -0.00%, +0.00%
Fill count: 4877087 -> 4876960 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 49087552 -> 49078448 (-0.02%); split: +0.00%, -0.02%
Totals from 1099332 (54.44% of 2019443) affected shaders:
Instrs: 742670473 -> 742597351 (-0.01%); split: -0.01%, +0.00%
Cycle count: 100455549635 -> 100223925728 (-0.23%); split: -0.23%, +0.00%
Spill count: 3384366 -> 3384308 (-0.00%); split: -0.00%, +0.00%
Fill count: 4837434 -> 4837307 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 26725152 -> 26716048 (-0.03%); split: +0.00%, -0.03%
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 997603774 -> 997529238 (-0.01%); split: -0.01%, +0.00%
Cycle count: 93904012762 -> 93646730006 (-0.27%); split: -0.28%, +0.00%
Spill count: 3710155 -> 3710125 (-0.00%); split: -0.00%, +0.00%
Fill count: 5032908 -> 5032819 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 37929640 -> 37811560 (-0.31%)
Totals from 1334920 (58.52% of 2281134) affected shaders:
Instrs: 817377787 -> 817303251 (-0.01%); split: -0.01%, +0.00%
Cycle count: 88468851658 -> 88211568902 (-0.29%); split: -0.29%, +0.00%
Spill count: 3663353 -> 3663323 (-0.00%); split: -0.00%, +0.00%
Fill count: 4991629 -> 4991540 (-0.00%); split: -0.01%, +0.00%
Max dispatch width: 20245832 -> 20127752 (-0.58%)
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 1013433769 -> 1013363273 (-0.01%); split: -0.01%, +0.00%
Cycle count: 85766921182 -> 85509316620 (-0.30%); split: -0.31%, +0.00%
Spill count: 3903923 -> 3903944 (+0.00%); split: -0.00%, +0.00%
Fill count: 6801983 -> 6801948 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 37896320 -> 37805320 (-0.24%); split: +0.00%, -0.24%
Totals from 1333814 (58.54% of 2278396) affected shaders:
Instrs: 830200531 -> 830130035 (-0.01%); split: -0.01%, +0.00%
Cycle count: 80746184101 -> 80488579539 (-0.32%); split: -0.32%, +0.01%
Spill count: 3855771 -> 3855792 (+0.00%); split: -0.00%, +0.00%
Fill count: 6755513 -> 6755478 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 20301456 -> 20210456 (-0.45%); split: +0.00%, -0.45%
Skylake
Totals:
Instrs: 519389758 -> 519874108 (+0.09%); split: -0.00%, +0.10%
Cycle count: 57932316132 -> 57789433956 (-0.25%); split: -0.25%, +0.00%
Spill count: 636741 -> 636715 (-0.00%); split: -0.01%, +0.00%
Fill count: 860470 -> 860357 (-0.01%); split: -0.02%, +0.00%
Max dispatch width: 32527800 -> 32481792 (-0.14%); split: +0.00%, -0.14%
Totals from 1080380 (62.25% of 1735462) affected shaders:
Instrs: 411976399 -> 412460749 (+0.12%); split: -0.00%, +0.12%
Cycle count: 54291447615 -> 54148565439 (-0.26%); split: -0.27%, +0.00%
Spill count: 602993 -> 602967 (-0.00%); split: -0.01%, +0.00%
Fill count: 734459 -> 734346 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 18626096 -> 18580088 (-0.25%); split: +0.00%, -0.25%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
The Bspec says that SEL sources and destination can be any mix of *B,
*W, and *D. We should allow those. Specifically, without this change,
this instruction
sel.sat.l(8) v548:UD, v899:D, 255d
gets unnecessarily split into two instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
Prevents assertion failures in func.shader-ballot.basic.q0 and other
tests starting with "nir/algebraic: Optimize some b2f of integer
comparison".
Vector immediates, bfloat, and 8-bit floats are still not supported.
v2: Almost complete re-write based on suggestions from Ken.
v3: Don't retype() on a brw_imm_f value.
Fixes: f8e54d02f7 ("intel/compiler: Relax mixed type restriction for saturating immediates")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38978>
Move options were bit or-ing from the wrong enum, causing undefined
behaviour when the number of intrinsics changed.
Replaced it with the values from the right nir_move_options enum that
were previously working. (Further refinement needed on these after
extensive testing.)
Fixes: f1b24267d2 ("pco: rework nir processing and passes")
Signed-off-by: Radu Costas <radu.costas@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40568>
If the FD passed in isn't from a render node, try to determine the
corresponding render node and open it. If that succeeds, pass the
render node FD to ac_drm_device_initialize.
The existing code already detects when ac_drm_device_get_fd doesn't
return the FD passed in, and handles that case correctly.
This avoids issues with unauthenticated FDs from card nodes.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7289
v2:
* Always close render_fd after calling ac_drm_device_initialize for it.
(Pierre-Eric Pelloux-Prayer)
* Formatting tweaks for logging when ac_drm_device_initialize fails for
render_fd.
v3: (Pierre-Eric Pelloux-Prayer)
* Log render_device path when ac_drm_device_initialize fails for
render_fd.
* Fix render_device string leak.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40519>
They check for if uses and want to return false but nir_foreach_use()
means the if uses are never seen.
Cc: mesa-stable
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37481>
these functions should no longer be used by serious drivers. for those that
do use them, they now need to pass their own destructor function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40462>