Commit graph

172842 commits

Author SHA1 Message Date
Caio Oliveira
40ba00238b compiler/types: Tidy up the asserts in get_*_instance functions
Use the local variable in the assertions, move them out the critical region.
No behavior change.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23279>
2023-06-15 03:43:46 +00:00
Caio Oliveira
efbbdeffc0 compiler/types: Be consistent when naming array element/size
The element type passed is different than the array type and it is not
a "base type" in the glsl_type sense, so pick a name that reflects that.
Also stick to a single name for the array_size.

Just renames, no behavior change.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23279>
2023-06-15 03:43:46 +00:00
Jesse Natalie
83f741124b nir_lower_returns: Mark assert-only var as ASSERTED
Fixes: 5d238c0c ("nir_lower_returns: Optimize phis before beginning the pass")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23634>
2023-06-15 03:09:29 +00:00
Dave Airlie
13df91d7d7 radv/video: restrict the number of IBs on video related queues.
The hardware gets given a session context from userspace in each
submission, but if the session context changes the hardware wants
a FENCE to be emitted to know it can give up the current session.

IF a test submits interleaved session ctx access and uses a single
vulkan submit the hardware crashes, unless each IB is submitted
in a separate submission so the fence can be sent.

In theory it could be possible to construct a single command buffer
to trigger this so I do think the hardware should be smarter here.

Should this be fixed in the kernel to always emit a fence between
IBs?

Fixes: dEQP-VK.video.decode.h264_interleaved

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23641>
2023-06-15 02:49:00 +00:00
LingMan
0535948535 rusticl: fix UB in CLProp machinery
Viewing structs as a collection of u8 is not generally sound. Any padding bytes might be
uninitialized and creating an integer from uninitialized memory constitutes producing an invalid
value, which is instant UB.

Since we only copy these bytes around, the fix is to simply work with MaybeUninit<u8>, which can handle uninitialized memory just fine, instead.

See: https://doc.rust-lang.org/reference/behavior-considered-undefined.html
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
2023-06-15 02:31:19 +00:00
LingMan
fdcb86168d rusticl: drop cl_prop_for_type macro
There's no reason to differentiate between primitive types and structs here. `cl_prop_for_struct`
can handle primitive types just fine.
Drop `cl_prop_for_type` and rename the existing `cl_prop_for_struct` to `cl_prop_for_type`.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
2023-06-15 02:31:19 +00:00
LingMan
cf43a74c79 rusticl: drop CLProp implementation for String
Route the data to the implementation for &str instead. It works just as fine.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
2023-06-15 02:31:19 +00:00
LingMan
f1461c5a77 rusticl: core: stop using cl_prop from the api module
It's a layering violation and really the wrong tool for the job. Add a new fn to view a given slice
as a &[u8] instead of going though the clprop machinery which creates a new Vec.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
2023-06-15 02:31:18 +00:00
Charmaine Lee
2755519142 svga: fix compute shader type after ntt
Reset compute shader type after ntt.

Fixes: 0ac9541804 ("gallium: Drop PIPE_SHADER_CAP_PREFERRED_IR")

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23659>
2023-06-15 02:11:38 +00:00
Karol Herbst
095fee55f8 rusticl: enforce using unsafe blocks in unsafe functions
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23660>
2023-06-15 01:53:11 +00:00
Mike Blumenkrantz
4edbe8f5a0 zink: add mem debugging
modeled off turnip's debug infra, this adds debug printing for oom
scenarios

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23653>
2023-06-15 01:31:24 +00:00
Mike Blumenkrantz
65fad783c7 zink: break out vk flag unrolling into util function
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23653>
2023-06-15 01:31:24 +00:00
Ian Romanick
de60b463d7 nir/algebraic: Simplify various trivial bfi
These are mostly just obvious patterns that somebody will eventually
want to add.

DG2, Tiger Lake, Ice Lake, Skylake, Broadwell, and Haswell had similar
results (Ice Lake shown)
total instructions in shared programs: 20570033 -> 20570026 (<.01%)
instructions in affected programs: 7363 -> 7356 (-0.10%)
helped: 6 / HURT: 0

total cycles in shared programs: 902118781 -> 902118854 (<.01%)
cycles in affected programs: 419132 -> 419205 (0.02%)
helped: 4 / HURT: 2

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819500 -> 152819380 (-0.00%)
Cycles: 15014627187 -> 15014624437 (-0.00%)

Totals from 115 (0.02% of 662497) affected shaders:
Instrs: 28963 -> 28843 (-0.41%)
Cycles: 404582 -> 401832 (-0.68%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
541e7eb389 nir/algebraic: Optimize some u2f of bfi
v2: Fix a copy-and-paste bug s/('find_lsb', a)/a/ in the patterns. See
piglit!819.

DG2, Tiger Lake, Ice Lake, Skylake, and Broadwell had similar results (Ice Lake shown)
total instructions in shared programs: 20570063 -> 20570033 (<.01%)
instructions in affected programs: 452 -> 422 (-6.64%)
helped: 30 / HURT: 0

total cycles in shared programs: 902118723 -> 902118781 (<.01%)
cycles in affected programs: 1762 -> 1820 (3.29%)
helped: 0 / HURT: 29

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152819969 -> 152819500 (-0.00%)
Cycles: 15014628652 -> 15014627187 (-0.00%); split: -0.00%, +0.00%

Totals from 469 (0.07% of 662497) affected shaders:
Instrs: 7644 -> 7175 (-6.14%)
Cycles: 31787 -> 30322 (-4.61%); split: -4.90%, +0.29%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
96cde9cc01 intel/fs: Emit better code for bfi(..., 0)
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20570141 -> 20570063 (<.01%)
instructions in affected programs: 30679 -> 30601 (-0.25%)
helped: 77 / HURT: 0

total cycles in shared programs: 902113977 -> 902118723 (<.01%)
cycles in affected programs: 3255958 -> 3260704 (0.15%)
helped: 60 / HURT: 19

Broadwell
total instructions in shared programs: 18524633 -> 18524547 (<.01%)
instructions in affected programs: 34095 -> 34009 (-0.25%)
helped: 75 / HURT: 2

total cycles in shared programs: 949532394 -> 949543761 (<.01%)
cycles in affected programs: 3419107 -> 3430474 (0.33%)
helped: 57 / HURT: 24

total spills in shared programs: 22484 -> 22484 (0.00%)
spills in affected programs: 516 -> 516 (0.00%)
helped: 2 / HURT: 2

total fills in shared programs: 29346 -> 29338 (-0.03%)
fills in affected programs: 572 -> 564 (-1.40%)
helped: 4 / HURT: 0

Haswell
total instructions in shared programs: 17331356 -> 17331523 (<.01%)
instructions in affected programs: 27920 -> 28087 (0.60%)
helped: 41 / HURT: 4

total cycles in shared programs: 936603192 -> 936574664 (<.01%)
cycles in affected programs: 3417695 -> 3389167 (-0.83%)
helped: 28 / HURT: 21

total spills in shared programs: 19718 -> 19756 (0.19%)
spills in affected programs: 436 -> 474 (8.72%)
helped: 0 / HURT: 4

total fills in shared programs: 22547 -> 22607 (0.27%)
fills in affected programs: 444 -> 504 (13.51%)
helped: 0 / HURT: 4

Ivy Bridge
total cycles in shared programs: 463451277 -> 463451273 (<.01%)
cycles in affected programs: 95870 -> 95866 (<.01%)
helped: 3 / HURT: 2

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152825278 -> 152819969 (-0.00%); split: -0.00%, +0.00%
Cycles: 15014075626 -> 15014628652 (+0.00%); split: -0.01%, +0.01%
Subgroup size: 8528536 -> 8528560 (+0.00%)
Send messages: 7711431 -> 7711464 (+0.00%)
Spill count: 99907 -> 99509 (-0.40%); split: -0.40%, +0.00%
Fill count: 202459 -> 201598 (-0.43%); split: -0.43%, +0.00%
Scratch Memory Size: 4376576 -> 4371456 (-0.12%)

Totals from 2915 (0.44% of 662497) affected shaders:
Instrs: 2288842 -> 2283533 (-0.23%); split: -0.24%, +0.01%
Cycles: 471633295 -> 472186321 (+0.12%); split: -0.27%, +0.39%
Subgroup size: 27488 -> 27512 (+0.09%)
Send messages: 151344 -> 151377 (+0.02%)
Spill count: 48091 -> 47693 (-0.83%); split: -0.83%, +0.00%
Fill count: 59053 -> 58192 (-1.46%); split: -1.46%, +0.00%
Scratch Memory Size: 1827840 -> 1822720 (-0.28%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
6603948a7a nir/algebraic: Lower some bfi with two constant sources
All Haswell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19907054 -> 19906882 (<.01%)
instructions in affected programs: 8103 -> 7931 (-2.12%)
helped: 52 / HURT: 0

total cycles in shared programs: 855779334 -> 855781791 (<.01%)
cycles in affected programs: 724201 -> 726658 (0.34%)
helped: 38 / HURT: 7

total sends in shared programs: 1039308 -> 1039302 (<.01%)
sends in affected programs: 162 -> 156 (-3.70%)
helped: 2 / HURT: 0

No shader-db changes on any older Intel platforms.

All Intel platforms had similar restuls. (Ice Lake shown)
Totals:
Instrs: 153117340 -> 152825222 (-0.19%); split: -0.19%, +0.00%
Cycles: 15011904351 -> 15014072944 (+0.01%); split: -0.04%, +0.05%
Send messages: 7711509 -> 7711421 (-0.00%)
Spill count: 100745 -> 99907 (-0.83%); split: -0.85%, +0.02%
Fill count: 203684 -> 202459 (-0.60%); split: -0.62%, +0.02%
Scratch Memory Size: 4403200 -> 4376576 (-0.60%)

Totals from 18603 (2.81% of 662496) affected shaders:
Instrs: 5258303 -> 4966185 (-5.56%); split: -5.56%, +0.00%
Cycles: 447391388 -> 449559981 (+0.48%); split: -1.29%, +1.77%
Send messages: 559231 -> 559143 (-0.02%)
Spill count: 5009 -> 4171 (-16.73%); split: -17.17%, +0.44%
Fill count: 8769 -> 7544 (-13.97%); split: -14.33%, +0.36%
Scratch Memory Size: 194560 -> 167936 (-13.68%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
e419eefd34 intel/fs: Use nir_opt_reassociate_bfi
All Skylake and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19907072 -> 19907054 (<.01%)
instructions in affected programs: 8859 -> 8841 (-0.20%)
helped: 9 / HURT: 0

total cycles in shared programs: 855791238 -> 855779334 (<.01%)
cycles in affected programs: 3308294 -> 3296390 (-0.36%)
helped: 12 / HURT: 13

Broadwell
total instructions in shared programs: 17818231 -> 17817440 (<.01%)
instructions in affected programs: 9887 -> 9096 (-8.00%)
helped: 9 / HURT: 0

total cycles in shared programs: 902970035 -> 902941221 (<.01%)
cycles in affected programs: 2767243 -> 2738429 (-1.04%)
helped: 14 / HURT: 5

total spills in shared programs: 17784 -> 17718 (-0.37%)
spills in affected programs: 318 -> 252 (-20.75%)
helped: 1 / HURT: 0

total fills in shared programs: 25458 -> 24949 (-2.00%)
fills in affected programs: 1346 -> 837 (-37.82%)
helped: 1 / HURT: 0

Haswell
total instructions in shared programs: 16707799 -> 16707586 (<.01%)
instructions in affected programs: 24049 -> 23836 (-0.89%)
helped: 41 / HURT: 0

total cycles in shared programs: 882730648 -> 882723174 (<.01%)
cycles in affected programs: 5096737 -> 5089263 (-0.15%)
helped: 25 / HURT: 12

total spills in shared programs: 14937 -> 14909 (-0.19%)
spills in affected programs: 436 -> 408 (-6.42%)
helped: 4 / HURT: 0

total fills in shared programs: 17569 -> 17529 (-0.23%)
fills in affected programs: 444 -> 404 (-9.01%)
helped: 4 / HURT: 0

No shader-db changes on any older Intel platforms.

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 153118594 -> 153117340 (-0.00%); split: -0.00%, +0.00%
Cycles: 15011967556 -> 15011904351 (-0.00%); split: -0.00%, +0.00%
Fill count: 203692 -> 203684 (-0.00%)

Totals from 703 (0.11% of 662496) affected shaders:
Instrs: 192826 -> 191572 (-0.65%); split: -0.65%, +0.00%
Cycles: 29937640 -> 29874435 (-0.21%); split: -0.25%, +0.04%
Fill count: 4146 -> 4138 (-0.19%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
83bd87c558 nir: Add optimization pass to reassociate some bfi instructions
The needs of this pass are ever so slightly more than what
nir_opt_algebraic can do. :( Specifically, it needs to be able to look
at the relationship of constant values used in an expression tree.

v2: Add nir_mov_alu to handle swizzles on the original sources.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Mike Blumenkrantz
a085fead0c zink: add some ci flakes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23654>
2023-06-14 18:18:41 +00:00
Daniel Stone
2760aeb13e CI: Re-enable freedreno CI
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
2023-06-14 17:39:29 +00:00
Daniel Stone
6af691dfff ci: Extend a618_vk_full runtime
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
2023-06-14 17:39:29 +00:00
Daniel Stone
c41d493f77 ci: Don't retry manual or scheduled jobs
Only retry when there's some kind of non-job failure, such as
runner-internal issues, or API/network issues, etc. If the job itself
fails or times out, then given the length of these jobs, there's no
point trying again and just tying up the job slots for even more hours.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
2023-06-14 17:39:29 +00:00
Daniel Stone
47991a094e ci: Elaborate causes for job retries
Rather than always retrying, only retry jobs on a limited set of causes.
This notably excludes retries when a job is stuck due to lack of runners
to schedule it; if we can't get a slot on a runner in time, there's no
reason to try again, since our window of opportunity has gone.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23108>
2023-06-14 17:39:29 +00:00
Emma Anholt
5ef4e1c4c0 ci: Drop some skips of GL CTS ArraysOfArrays tests.
My hope is that with my CTS fix, we can complete these all in time now.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23610>
2023-06-14 16:45:23 +00:00
Emma Anholt
97744f11cf ci: Drop skips for some previously-invalid CTS tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23610>
2023-06-14 16:45:23 +00:00
Emma Anholt
8c35537351 ci: Update to vulkan-cts-1.3.5.2 (and pull in some more fixes).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23610>
2023-06-14 16:45:23 +00:00
Emma Anholt
e3b0a79b3a ci/zink: Update current xfails on tgl.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23610>
2023-06-14 16:45:23 +00:00
Emma Anholt
10b94772d2 intel: Reduce cost of resetting last_grf_write.
In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of
last_grf_write 2400 times, multiple times through the scheduler.  Just
resetting for the processed instructions reduces runtime from 21s to 16s.
No change on steam shader-db runtime across several runs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
7d4769e802 intel: Allocate the last_grf_write once per scheduler.
No need to re-calloc it per block when we're going to use it again.  Also,
this fixes the vec4 backend to avoid allocating giant grf_count-sized
arrays on the stack.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
2ad865b219 intel: Count reads_remaining across all blocks.
We were zeroing it out per block, but it doesn't actually help to count
per block, since the question is "will scheduling this instruction free
the reg?".  Saves some memsetting, which was showing up high in the
profile (but not from this source).

No change on iris SKL shader-db.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:55 +00:00
Mike Blumenkrantz
12a47b84b7 egl/dri2: trigger drawable invalidation from surface queries for zink
this mimics dri3 behavior and avoids scenarios where renderbuffers can
get out of sync with their resources

fixes #6744

Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22858>
2023-06-14 15:38:21 +00:00
Mike Blumenkrantz
1563aea69f lavapipe: add version uuid to shader binary validation
this ensures compatible shader binaries across versions

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23636>
2023-06-14 14:32:36 +00:00
Gert Wollny
b79f6ec397 r600: Disable SB if we use the ariable length DOT
sb doesn't know about this instruction, so don't try to run the
optimizer.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23647>
2023-06-14 13:14:19 +00:00
Gert Wollny
269895c674 600/sfn: Trigger use of ACK for some barriers
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23647>
2023-06-14 13:14:19 +00:00
Gert Wollny
d6280a8eef r600/sfn: move kill handling to fully scheduling
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23647>
2023-06-14 13:14:19 +00:00
Gert Wollny
f7e6171f3a r600: fix handling of use_sb flag
The compiler will use the unsigned bit pattern of the check and combine this
with the 1 bit, which will always result in use_sb to be zero.

Fix this by making use_sb a bool

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23647>
2023-06-14 13:14:19 +00:00
Mike Blumenkrantz
4e87d81d20 zink: add a dgc debug mode for testing
this is useful for drivers trying to implement DGC since there is no cts

do not use.

it will not make anything faster.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23550>
2023-06-14 12:37:24 +00:00
Lionel Landwerlin
6b9f838d62 intel/fs: handle load_global_constant_uniform_block_intel
Again, load the data just once in GRF, share it across lanes.

Shader-db on dg2:

total instructions in shared programs: 23214555 -> 23215400 (<.01%)
instructions in affected programs: 199977 -> 200822 (0.42%)
helped: 3
HURT: 38
helped stats (abs) min: 5 max: 670 x̄: 283.67 x̃: 176
helped stats (rel) min: 1.34% max: 49.41% x̄: 22.15% x̃: 15.70%
HURT stats (abs)   min: 1 max: 185 x̄: 44.63 x̃: 32
HURT stats (rel)   min: 0.13% max: 42.86% x̄: 10.25% x̃: 9.30%
95% mean confidence interval for instructions value: -18.65 59.87
95% mean confidence interval for instructions %-change: 3.29% 12.47%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 5928 -> 5928 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 851137495 -> 851152449 (<.01%)
cycles in affected programs: 16406137 -> 16421091 (0.09%)
helped: 9
HURT: 32
helped stats (abs) min: 10 max: 13498 x̄: 6443.22 x̃: 5581
helped stats (rel) min: 0.11% max: 4.75% x̄: 1.45% x̃: 0.34%
HURT stats (abs)   min: 3 max: 15056 x̄: 2279.47 x̃: 735
HURT stats (rel)   min: 0.10% max: 23.71% x̄: 4.58% x̃: 4.65%
95% mean confidence interval for cycles value: -1315.40 2044.87
95% mean confidence interval for cycles %-change: 1.71% 4.80%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 11856 -> 11825 (-0.26%)
spills in affected programs: 2368 -> 2337 (-1.31%)
helped: 4
HURT: 0

total fills in shared programs: 16258 -> 16207 (-0.31%)
fills in affected programs: 2930 -> 2879 (-1.74%)
helped: 4
HURT: 0

total sends in shared programs: 1038194 -> 1038185 (<.01%)
sends in affected programs: 40 -> 31 (-22.50%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.25 x̃: 2
helped stats (rel) min: 10.00% max: 33.33% x̄: 21.46% x̃: 21.25%
95% mean confidence interval for sends value: -4.64 0.14
95% mean confidence interval for sends %-change: -40.41% -2.51%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 0

Some VK/DX titles result (on DG2 only), it's mostly additional
instruction counts except for the unity spaceship demo where a CS
shader gets additional SIMDness. The reason for additional
instructions is that since we're doing block loads, we need to find
the live channels in control flow to select a single lane value that
is valid.

aztec_ruins_high:
Totals from 3 (1.12% of 269) affected shaders:
Instrs: 17732 -> 17896 (+0.92%)
Cycles: 796518 -> 819302 (+2.86%)

cyberpunk_2077:
Totals from 17 (0.17% of 10301) affected shaders:
Instrs: 10848 -> 11658 (+7.47%)
Cycles: 248243 -> 259168 (+4.40%); split: -0.57%, +4.97%

fallout_4_dxvk_g2:
Totals from 2 (0.12% of 1638) affected shaders:
Instrs: 3157 -> 3368 (+6.68%)
Cycles: 487807 -> 490426 (+0.54%); split: -0.26%, +0.79%
Max live registers: 139 -> 141 (+1.44%)

red_dead_redemption2:
Totals from 68 (1.14% of 5970) affected shaders:
Instrs: 34871 -> 36486 (+4.63%)
Cycles: 551430 -> 565211 (+2.50%)
Send messages: 2074 -> 2072 (-0.10%)
Max live registers: 5078 -> 5077 (-0.02%)

total_war_warhammer2:
Totals from 5 (1.05% of 478) affected shaders:
Instrs: 6905 -> 6971 (+0.96%); split: -0.16%, +1.12%
Cycles: 97035 -> 97989 (+0.98%); split: -0.07%, +1.05%

unity spaceship demo (instruction count going up due to a CS shader
                      bump from SIMD8->16):
Totals from 53 (9.71% of 546) affected shaders:
Instrs: 223748 -> 233223 (+4.23%); split: -0.01%, +4.25%
Cycles: 23134697 -> 25207080 (+8.96%); split: -0.17%, +9.13%
Subgroup size: 480 -> 488 (+1.67%)
Spill count: 2156 -> 2242 (+3.99%); split: -0.19%, +4.17%
Fill count: 4617 -> 4845 (+4.94%); split: -0.09%, +5.02%
Max live registers: 5991 -> 6050 (+0.98%); split: -0.40%, +1.39%
Max dispatch width: 480 -> 488 (+1.67%)

witcher_3_dxvk_g2:
Totals from 27 (2.51% of 1074) affected shaders:
Instrs: 57067 -> 57677 (+1.07%); split: -0.03%, +1.10%
Cycles: 1397871 -> 1436704 (+2.78%); split: -0.35%, +3.13%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
4ee1a8bb9c nir: add a load_global_constant uniform intel variant
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
5ae8a78d8c intel/fs: make use of load_ubo_uniform_block_intel
The principle is the same as the load_ssbo_uniform_block_intel.
Whenever we see a uniform offset, load the data only once in GRFs to
reduce register pressure.

Iris shader-db run on DG2 :

total instructions in shared programs: 23001325 -> 23094969 (0.41%)
instructions in affected programs: 1775989 -> 1869633 (5.27%)
helped: 764
HURT: 2097
helped stats (abs) min: 1 max: 102 x̄: 6.96 x̃: 2
helped stats (rel) min: 0.03% max: 16.91% x̄: 1.36% x̃: 0.63%
HURT stats (abs)   min: 1 max: 2461 x̄: 47.19 x̃: 7
HURT stats (rel)   min: <.01% max: 199.34% x̄: 5.91% x̃: 2.60%
95% mean confidence interval for instructions value: 25.43 40.03
95% mean confidence interval for instructions %-change: 3.60% 4.33%
Instructions are HURT.

total loops in shared programs: 5847 -> 5847 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 839329852 -> 845491482 (0.73%)
cycles in affected programs: 130229434 -> 136391064 (4.73%)
helped: 1098
HURT: 2228
helped stats (abs) min: 1 max: 130102 x̄: 1340.64 x̃: 22
helped stats (rel) min: <.01% max: 64.25% x̄: 4.03% x̃: 0.71%
HURT stats (abs)   min: 1 max: 185309 x̄: 3426.24 x̃: 87
HURT stats (rel)   min: <.01% max: 92.85% x̄: 8.12% x̃: 3.82%
95% mean confidence interval for cycles value: 1342.16 2362.97
95% mean confidence interval for cycles %-change: 3.70% 4.52%
Cycles are HURT.

total spills in shared programs: 10768 -> 11856 (10.10%)
spills in affected programs: 9717 -> 10805 (11.20%)
helped: 25
HURT: 28

total fills in shared programs: 13720 -> 16258 (18.50%)
fills in affected programs: 12016 -> 14554 (21.12%)
helped: 25
HURT: 28

total sends in shared programs: 1034790 -> 1031266 (-0.34%)
sends in affected programs: 33416 -> 29892 (-10.55%)
helped: 1005
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.51 x̃: 3
helped stats (rel) min: 1.69% max: 60.00% x̄: 15.20% x̃: 14.08%
95% mean confidence interval for sends value: -3.72 -3.29
95% mean confidence interval for sends %-change: -15.82% -14.57%
Sends are helped.

LOST:   26
GAINED: 183

shader-db on a number of VK/DX titles on DG2 :

 PERCENTAGE DELTAS  Shaders   Instrs    Cycles
 age_of_wonders_III 1928      +0.02%    -0.19%

 PERCENTAGE DELTAS       Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Max live registers Max dispatch width
 assassins_creed_odyssey 2119      +1.12%    -0.42%      -0.03%        -0.29%       -9.10%     -4.26%         -0.64%             +0.65%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Spill count Fill count Max live registers
 aztec_ruins_high  269       -0.05%    -0.45%     -0.29%     -7.27%         -0.33%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Max live registers Max dispatch width
 dark_souls_3_dxvk_g2 1420      +0.09%    +0.24%        +0.21%             +0.12%

(stats look bad, but it's just one shader affected)
 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Spill count Fill count Scratch Memory Size Max live registers
 fallout_4_dxvk_g2 1638      +0.67%    +8.32%    +16.02%     +7.17%         +100.00%            +0.48%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Send messages Spill count Fill count Max live registers Max dispatch width
 red_dead_redemption2 5969      +0.16%    -0.04%      -0.04%       +0.01%     +0.05%         -0.20%             +0.04%

 PERCENTAGE DELTAS          Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 rise_of_the_tomb_raider_g2 12129     +2.19%    +1.36%      -1.23%          -0.36%             +2.04%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers
 shooter-game      693       +0.07%    -0.89%      -0.09%          -0.09%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 talos_g2          1140      +0.37%    +3.80%      -0.86%          -0.67%             +0.19%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Max live registers Max dispatch width
 total_war_warhammer2 477       +0.25%    +0.66%        -0.17%             +0.10%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 witcher_3_dxvk_g2 1074      +0.75%   -10.45%      -0.15%          -0.16%             -0.16%

 PERCENTAGE DELTAS      Shaders   Instrs    Cycles  Send messages Max live registers
 wolfenstein_youngblood 1111      +0.52%    +0.66%      -0.59%          -0.03%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
4a23a5a904 nir: add a new ubo uniform loading intrinsic for intel
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
7eb1e2a690 intel/fs: avoid reusing the VGRF for uniform load_ubo
Only found 3 shaders affected in Red Dead Redemption :

Totals from 3 (0.05% of 5969) affected shaders:
Instrs: 2246 -> 2230 (-0.71%)
Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05%

This will have a larger effect when we add the
load_ubo_uniform_block_intel intrinsic where we will have larger
blocks (vec8/vec16 vs vec4 only now).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
ff3494fce3 intel/fs: print identation for control flow
INTEL_DEBUG=optimizer output changes from :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42: txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43: and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44: cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45: (+f0.0) if(8) (null):UD,
{ 11}   46: mov(8) vgrf270:D, -1082130432d
{ 12}   47: mov(8) vgrf271:D, 1082130432d
{ 14}   48: mov(8) vgrf274+0.0:D, 0d
{ 14}   49: mov(8) vgrf274+1.0:D, 0d

to :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42:   txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43:   and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44:   cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45:   (+f0.0) if(8) (null):UD,
{ 11}   46:     mov(8) vgrf270:D, -1082130432d
{ 12}   47:     mov(8) vgrf271:D, 1082130432d
{ 14}   48:     mov(8) vgrf274+0.0:D, 0d
{ 14}   49:     mov(8) vgrf274+1.0:D, 0d

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Karol Herbst
5b3ff7e3f3 rusticl/queue: overhaul of the queue+event handling
This new approach handles things as follows:
1. Fences won't be attached to events anymore, applications only wait on
   the cv attached to the event.
2. Only the queue is allowed to update event status for non user events.
   This will eliminate all remaining status updating races between the
   queue and applications waiting on events.
3. Queue minimized flushing by bundling events
4. Increase cv wait timeout as there is really no point in waking up too
   often.

Reduces amount of emited fences on radeonsi in luxmark 3.1 luxball by 90%

Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed by Nora Allen <blackcatgames@protonmail.com>

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23612>
2023-06-14 11:14:46 +00:00
Iago Toral Quiroga
6114e66124 broadcom/compiler: only use last thread switch flag to detect final section
Since commit 'c98ddc778a3 broadcom/compiler: force a last thrsw for spilling'
we always ensure we signal the last thread section explicitly with a
last thread switch.

Relying on VPM stores to detect the last thread section is particularly bad,
because we can have VPM stores occurring quite early in a shader program,
which would disable TMU spilling almost entirely.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22461>
2023-06-14 09:27:50 +00:00
Alejandro Piñeiro
dfdbf5bf94 broadcom/compiler: clarify use of QFILE_VPM
This was only used for version < 40 (See commit 22a02f3e3).

Adding some extra explanations and asserts of places where it is used.

As we are here also move the definition of a register with QFILE_VPM,
to avoid defining it if not needed.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22984>
2023-06-14 09:03:35 +00:00
Lionel Landwerlin
0cd9f0c3d3 intel/fs: fix bindless/shared surface mistake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 068bf1378d ("intel/fs: enable SSBO accesses through the bindless heap")
Tested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23536>
2023-06-14 07:42:57 +00:00
Lionel Landwerlin
b3b12c2c27 anv: enable CmdCopyQueryPoolResults to use shader for copies
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23074>
2023-06-14 09:43:57 +03:00
Lionel Landwerlin
e86f3c7abb intel/ds: add query count in query tracepoints
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23074>
2023-06-14 09:43:57 +03:00
Lionel Landwerlin
930e862af7 anv: add shaders for copying query results
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23074>
2023-06-14 09:43:57 +03:00