These were only used in one place. Drop them and convert to using the
new register packers for improved cross-gen portability.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
The non-variant reg packers were removed in commit fd6489c026 ("tu:
Drop emitting of deprecated packing."), along with
FD_NO_DEPRECATED_PACK.
Add support to mark the even older reg builders as deprecated, and
re-introduce FD_NO_DEPRECATED_PACK to control this.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
If the builder is passed just a raw value, like an iova in the case of
turnip, we probably don't want to assert that it is all unknown bits.
This hasn't been a problem for the gallium driver, as the bo pointer is
first. But became a problem for turnip with commit 2d6c15ad57 ("tu:
remove magic bo reg packing (use iovas directly)").
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
sam.s2en uses the first src for its samp/tex while the coordinates (for
which derivatives need to be calculated) are in the second src. We used
to unconditionally track needed helpers for the first src causing (eq)
to potentially get scheduled too early for sam.s2en. Fix this by using
the second src for sam.s2en.
Fixes frame instability in Metro Exodus.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 29f8277952 ("ir3/legalize: schedule (eq) more accurately")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38992>
Previously, we would spill at the NIR level any temp array over 16 vec4s.
This had two problems:
1) We wouldn't spill for the worst case scenario: a MAD accessing a dst
array and 3 different src arrays (that all get fully unspilled, rather
than just reloading the specific reg in the operand). This would fail to
register allocate. We haven't seen this in practice.
2) We would spill vec4[17] and larger arrays that weren't necessary to get
the shader to register allocate. This occurred on a FS for in Stray that
had a vec4[24] array and just 4 vec4s of register pressure other than the
array.
Instead, use NIR scratch spilling when the worst case set of vars to
reference in an instruction would overflow GPR space. This makes the
shader in Stray go from 11ms to .5ms, by eliminating all spilling and
leaving the array in GPRs. On the other hand, if leaving the arrays
unspilled in NIR means that we cause spilling in ir3, the fact that ir3
spills/reloads work on the whole array may cause the amount of spilling to
increase. However, we can see the effect is very small in terms of number
of shaders affected in shader-db and an overwhelmingly positive effect on
spills:
MaxWaves: 22522470 -> 22520664 (-0.01%)
Instrs: 396093281 -> 396122221 (+0.01%); split: -0.00%, +0.01%
STPs: 218915 -> 182907 (-16.45%)
LDPs: 155374 -> 153364 (-1.29%); split: -2.79%, +1.50%
Totals from 496 (0.03% of 1561298) affected shaders:
MaxWaves: 3792 -> 1986 (-47.63%)
Instrs: 441224 -> 470164 (+6.56%); split: -0.00%, +6.57%
CodeSize: 926164 -> 976734 (+5.46%); split: -0.05%, +5.52%
NOPs: 58896 -> 52765 (-10.41%); split: -14.95%, +4.60%
MOVs: 16314 -> 57901 (+254.92%)
COVs: 3293 -> 5146 (+56.27%)
Full: 12876 -> 23632 (+83.54%)
(ss): 18613 -> 11573 (-37.82%); split: -47.53%, +9.71%
(sy): 2539 -> 2505 (-1.34%); split: -10.75%, +9.41%
(ss)-stall: 40682 -> 26413 (-35.07%); split: -47.90%, +12.80%
(sy)-stall: 147862 -> 117004 (-20.87%); split: -37.65%, +16.69%
STPs: 38566 -> 2558 (-93.37%)
LDPs: 5060 -> 3050 (-39.72%); split: -85.77%, +45.93%
Cat0: 65593 -> 59487 (-9.31%); split: -13.42%, +4.15%
Cat1: 19667 -> 63105 (+220.87%)
Cat2: 155958 -> 157879 (+1.23%); split: -0.05%, +1.28%
Cat6: 105228 -> 94910 (-9.81%); split: -12.36%, +2.54%
Cat7: 2480 -> 2485 (+0.20%); split: -0.08%, +0.28%
Subgroup size: 31872 -> 31744 (-0.40%)
The primary impacted application from shader-db is gfxbench aztec ruins.
A quick test of it showed no significant performance improvement (n=3).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
Now that qcom has released the gpu firmware for the a750, let's stop
using my fw package in favor of the publicly-available ones.
v2:
* Be more specific in the list of files we want to keep (lumag)
* Uprev the linux firmware version
* Use gfx-ci/firmware rather than the upstream gitlab repo
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>
We also need to do this in the GLES-only code-path, otherwise we'll end
up setting PIPE_BIND_RENDER_TARGET for these, which means we'll
incorrectly require these to be color-renderable.
Fixes: 60e115dedf ("mesa/st: do not drop binding prematurely")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38945>
The VK_ERROR_FRAGMENTED_POOL and VK_ERROR_OUT_OF_POOL_MEMORY errors are
not as exceptional cases as most. These are expected to be hit by
applications in the normal course of doing their thing. Probably best
not to spam stderr and the debug logs with them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38940>
The pitch is in bytes, rather than pixels, whereas internally lrz_layout
uses a pitch in pixels. Adjust the xml and state emit accordingly.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38930>
Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff
as-if hand written:
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>
Instructions that calculate derivatives (whether implicitly or
explicitly) don't actually need helpers enabled as long as helpers were
enabled while their coordinates were calculated. We currently don't
track this and leave helpers enabled until the derivative instructions
themselves.
Improve this by adding a backwards data-flow analysis which tracks the
last instruction that wrote the coordinates so that helpers can be
disabled after that.
Totals from 38306 (23.26% of 164705) affected shaders:
Instrs: 19635952 -> 19647753 (+0.06%); split: -0.03%, +0.09%
CodeSize: 40465212 -> 40489860 (+0.06%); split: -0.03%, +0.09%
NOPs: 3493898 -> 3505699 (+0.34%); split: -0.16%, +0.49%
(ss)-stall: 1755983 -> 1755365 (-0.04%); split: -0.04%, +0.01%
(sy)-stall: 5345890 -> 5350570 (+0.09%); split: -0.03%, +0.12%
Last helper: 8754510 -> 6313744 (-27.88%); split: -27.89%, +0.01%
Cat0: 3821218 -> 3833019 (+0.31%); split: -0.14%, +0.45%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36410>
This shows off how we don't need to pass an explicit size per CRB instance
in our non-growable CSes.
However, I don't like the additional indentation I did to make a CRB go
out of scope when I needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38762>
Loosely based on freedreno's, but simplified since a lot of overflow
handling was already there in tu_cs. It successfully catches issues of:
- Overflowing the CRB reservation
- Starting a new CRB with one in progress.
- Emitting a pkt4 while a CRB emit is in progress.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38762>
Replace the duplicated swapchain image detection pattern across all
Vulkan drivers with the new wsi_common_is_swapchain_image() helper.
Since the swapchain handle can be extracted from VkImageCreateInfo's
pNext chain inside wsi_common_create_swapchain_image(), remove the
now-redundant VkSwapchainKHR parameter from that function.
This removes the #ifdef guards for Android/WSI platforms from each
driver, as the helper now handles this uniformly.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38541>
cat2_may_use_scalar_alu() was incorrect because the instruction could
use an indirectly-accessed const where a0.x (i.e. the offset) is
non-uniform. Fortunately, we already know whether this is the case,
because the original instruction would then write a non-shared GPR.
Also, the restrictions for scalar ALU are the same regardless of whether
we write up0.x or a shared GPR, and vice versa the restrictions for
normal cat2 are the same regardless of whether we write p0.x or a
non-shared GPR, so it should always be safe to write p0.x if non-shared
and up0.x if shared. So, just do that.
Fixes: 2a8c5ebc77 ("ir3: enable scalar predicates")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38895>
This is about writes to shared regs, not GPR (as early-preamble can only
use shared regs). It's a pretty hypothetical case, but might as well
get it correct.
Fixes: 189e494249 ("ir3: Add (sy) before end of preamble when necessary")
Reported-by: Job Noorman <jnoorman@igalia.com>
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38868>
We forgot to call tu_fill_render_pass_state when resuming because it was
mixed in with emitting commands for the start of the subpass. Fix that
by pulling it out. This adds some duplication, but I think it's better
than mixing command emission and CPU-side state setup in the same
function.
Fixes: cb0f414b2a ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38873>
Add support for A840 and X2-85. Despite slice count, differences in
memory bus and clks, they are architecturally similar from the PoV of
the UMD.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>
It was missed that this register changed for larger bin sizes. Use a
common bitset for all related gen8 regs, and change the field names for
earlier gens to match so the generated register packers dtrt.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38450>