This means we don't reserve the registers, which improves RA
considerably. Using a special preload psuedo-op instead of a regular
move allows us to constrain semantics and gaurantee coalescing.
shader-db on glmark2 subset:
total instructions in shared programs: 6448 -> 6442 (-0.09%)
instructions in affected programs: 230 -> 224 (-2.61%)
helped: 4
HURT: 0
total bytes in shared programs: 42232 -> 42196 (-0.09%)
bytes in affected programs: 1530 -> 1494 (-2.35%)
helped: 4
HURT: 0
total halfregs in shared programs: 2291 -> 1926 (-15.93%)
halfregs in affected programs: 2185 -> 1820 (-16.70%)
helped: 75
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18804>
This makes the dataflow easier to read, especially with splits and
collects (which take variable numbers of sources/destinations).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18804>
We know that phi nodes are always at the start (this is asserted in
agx_validate and a fundamental invariant of SSA form). That means we can
cheaply iterate all n phi nodes forward (or n non-phi nodes backwards)
in O(n) time. We already open code this idiom in a few places, use
common iterators instead so we don't need to justify in random places.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18804>
this ensures there's no weird perf happening, avoids using renderpass
instead of dynamic rendering, and avoids hitting an assert from broken
framebuffer construction
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19065>
Dont bump the depth if the application attempts to overflow or
underflow the stack.
Fixes: 6febe2b880 ("glthread: track all matrix stack depths")
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19059>
In that case we need to use the sysval. That sysval can be optimized anyway in
the nonvariable case. Fixes test_basic.get_linear_ids on panfrost.
Fixes: 998d84fca5 ("nir/lower_system_values: Support lowering more intrinsics")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18662>
Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.
The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling. Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.
We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.
One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains). That
said, it's still better than before.
For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).
shader-db results on Icelake:
total instructions in shared programs: 19600314 -> 19597528 (-0.01%)
instructions in affected programs: 65338 -> 62552 (-4.26%)
helped: 271 / HURT: 0
helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
95% mean confidence interval for instructions value: -10.71 -9.85
95% mean confidence interval for instructions %-change: -6.17% -5.43%
Instructions are helped.
total cycles in shared programs: 851842332 -> 851808165 (<.01%)
cycles in affected programs: 618577 -> 584410 (-5.52%)
helped: 271 / HURT: 0
helped stats (abs) min: 64 max: 540 x̄: 126.08 x̃: 111
helped stats (rel) min: 2.57% max: 37.97% x̄: 6.12% x̃: 5.06%
95% mean confidence interval for cycles value: -135.35 -116.80
95% mean confidence interval for cycles %-change: -6.67% -5.57%
Cycles are helped.
total sends in shared programs: 1025238 -> 1024308 (-0.09%)
sends in affected programs: 6454 -> 5524 (-14.41%)
helped: 271 / HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
95% mean confidence interval for sends value: -3.57 -3.29
95% mean confidence interval for sends %-change: -15.42% -14.54%
Sends are helped.
According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.
v2: Fix assertions about number of components and add more of them.
Combine the quads and triangles handling as it's nearly identical.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19061>
This boring refactor:
- makes it consistent for extension name alias
- shortens the line a bit to not further regress line width
- applies macro when possible to be consistent
- removes some redundant empty lines
v2: rebase
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18993>
if a resource is in use every frame and never goes idle, it becomes
impossible to execute pruning, as there is no tracking for when views
are no longer in use
to avoid eventually ooming in this scenario, add some data to zink_resource_object
which can effectively "queue" pruning of these views if ballooning is
detected at a time when the views are guaranteed to be safe to delete
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19056>
the existing model for descriptor pools works thusly:
* allocate pool
* when possible, reuse existing pool from overflow array
* when pool is full, append to overflow array
* on batch reset, cycle the overflow arrays to enable reuse
the problem with this is it uses two separate arrays for overflow, and these arrays
are cycled but never consolidated, leading to wasted memory allocations for whichever
array is not currently being recycled
to resolve this, make the following changes to batch resets:
* set the recycle index to whichever overflow array is smaller
* copy the elements from the smaller array to the larger array (consolidate)
* set the number of elements in the smaller array to 0
this leads to always appending to an empty array while the array containing all the
"free" pools is always the one that's available for recycled use
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19053>