Previously, we would spill at the NIR level any temp array over 16 vec4s.
This had two problems:
1) We wouldn't spill for the worst case scenario: a MAD accessing a dst
array and 3 different src arrays (that all get fully unspilled, rather
than just reloading the specific reg in the operand). This would fail to
register allocate. We haven't seen this in practice.
2) We would spill vec4[17] and larger arrays that weren't necessary to get
the shader to register allocate. This occurred on a FS for in Stray that
had a vec4[24] array and just 4 vec4s of register pressure other than the
array.
Instead, use NIR scratch spilling when the worst case set of vars to
reference in an instruction would overflow GPR space. This makes the
shader in Stray go from 11ms to .5ms, by eliminating all spilling and
leaving the array in GPRs. On the other hand, if leaving the arrays
unspilled in NIR means that we cause spilling in ir3, the fact that ir3
spills/reloads work on the whole array may cause the amount of spilling to
increase. However, we can see the effect is very small in terms of number
of shaders affected in shader-db and an overwhelmingly positive effect on
spills:
MaxWaves: 22522470 -> 22520664 (-0.01%)
Instrs: 396093281 -> 396122221 (+0.01%); split: -0.00%, +0.01%
STPs: 218915 -> 182907 (-16.45%)
LDPs: 155374 -> 153364 (-1.29%); split: -2.79%, +1.50%
Totals from 496 (0.03% of 1561298) affected shaders:
MaxWaves: 3792 -> 1986 (-47.63%)
Instrs: 441224 -> 470164 (+6.56%); split: -0.00%, +6.57%
CodeSize: 926164 -> 976734 (+5.46%); split: -0.05%, +5.52%
NOPs: 58896 -> 52765 (-10.41%); split: -14.95%, +4.60%
MOVs: 16314 -> 57901 (+254.92%)
COVs: 3293 -> 5146 (+56.27%)
Full: 12876 -> 23632 (+83.54%)
(ss): 18613 -> 11573 (-37.82%); split: -47.53%, +9.71%
(sy): 2539 -> 2505 (-1.34%); split: -10.75%, +9.41%
(ss)-stall: 40682 -> 26413 (-35.07%); split: -47.90%, +12.80%
(sy)-stall: 147862 -> 117004 (-20.87%); split: -37.65%, +16.69%
STPs: 38566 -> 2558 (-93.37%)
LDPs: 5060 -> 3050 (-39.72%); split: -85.77%, +45.93%
Cat0: 65593 -> 59487 (-9.31%); split: -13.42%, +4.15%
Cat1: 19667 -> 63105 (+220.87%)
Cat2: 155958 -> 157879 (+1.23%); split: -0.05%, +1.28%
Cat6: 105228 -> 94910 (-9.81%); split: -12.36%, +2.54%
Cat7: 2480 -> 2485 (+0.20%); split: -0.08%, +0.28%
Subgroup size: 31872 -> 31744 (-0.40%)
The primary impacted application from shader-db is gfxbench aztec ruins.
A quick test of it showed no significant performance improvement (n=3).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
This is mostly a s/dyn ShaderModel/ShaderModelInfo/ with a few manual fixes.
With this change, we now statically dispatch into ShaderModel, which is
a bit faster than dynamically dispatching. Together, this commit and the
last one improve compile times by about 1% geomean.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38913>
Because the tracked registers are really driver dependant, the driver
is expected to handle the tracked_registers struct itself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
Both PVR and PanVK are drivers for generic embedded GPU IP cores, so
just take the can_present_on_device implementation from PanVK, which
allows any platform devices for presentation.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38985>
Now that qcom has released the gpu firmware for the a750, let's stop
using my fw package in favor of the publicly-available ones.
v2:
* Be more specific in the list of files we want to keep (lumag)
* Uprev the linux firmware version
* Use gfx-ci/firmware rather than the upstream gitlab repo
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>
We also need to do this in the GLES-only code-path, otherwise we'll end
up setting PIPE_BIND_RENDER_TARGET for these, which means we'll
incorrectly require these to be color-renderable.
Fixes: 60e115dedf ("mesa/st: do not drop binding prematurely")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38945>
This is not actually necessary and moreover was corrupting
mipmapped arrayed 2D images in cases when the transition barrier
wasn't transitioning all mips, but more than one layer.
Keep the layout transition infrastructure in place as we'll need
it for transaction elimination CRC zeroing on v10-.
Fixes: c95f8993 ("panvk: add a meta command for transitioning image layout")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38972>
Vulkan allows destroying an image without destroying the views of
this image first. These views can not be used in any way and the
only thing that the user can do with such a view is destroy it.
This also means that the driver can not refer to the image inside
the image view's destructor.
Fixes cb3f6481 ("panvk: Create MS shadow images and views")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38972>
When swapping buffer with damage regions, to be strictly correct we
need to swap the entire back buffer to the front buffer. This needs to
be done in case the compositor does not support damage regions. This
means we need to ignore the input damage region and tell drisw to swap
the entire buffer.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38817>
It is enough to compute them after upload.
This saves some disk space and eliminates an unlikely
bug where the shader cache is shared between two GPUs
with the same chip but a different number of enabled CUs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38970>
On Xe3+, we have typed MSAA load/store message support. We can use them
during MSAA copies. We don't have to fallback on RCS companion queue
anymore.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
We are passing number of layers as inline parameter register, so figure
out z_pos and write to 2D MSAA array images in compute
shader. We already get component X, Y and sample index, all we needed
was the number of layers.
Ken:
- Use load/store var instead of derefs
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
Only 3D shader gets dispatched per sample not the compute shader.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33905>
The VK_ERROR_FRAGMENTED_POOL and VK_ERROR_OUT_OF_POOL_MEMORY errors are
not as exceptional cases as most. These are expected to be hit by
applications in the normal course of doing their thing. Probably best
not to spam stderr and the debug logs with them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38940>