Move the enabled storage_8bit property toggle into the base a7xx GPUProps
class. This enables storageBuffer8BitAccess Vulkan feature on all a7xx
hardware, much like the proprietary driver does. It's also a required
feature with Vulkan 1.4.
Fixes: dEQP-VK.info.device_mandatory_features on pre-a750 a7xx hardware.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39124>
Turnip has to build/munge descriptors in a bunch of places. Extract out
helpers so we don't have to duplicate the gen8 vs earlier descriptor
format changes everywhere.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39141>
The descriptor format changes in gen8, which is mostly abstracted by
the fdl helpers. But to utilize that we need to plumb the CHIP thru
12 layers for different ways of vk doing the same thing.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39141>
DEPTH field is actually STRUCTSIZETEXELS. And for buffers the iova only
needs to be byte aligned, replacing STARTOFFSETTEXELS on a6xx/a7xx.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39141>
Add astc hdr (float) formats, those get treated identically as the ldr
formats as the blocks have enough metadata to be decoded as float.
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38859>
They have the same rules for placement as (eq).
Blob places them right after the last cat5/cat6 instruction if
possible, we do the same for now.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Co-authored-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31885>
The new a7xx NOP flags (eolm), (eogm) have similar to (eq)
constraints and helper_sched could be used for them with minimal
modifications.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Co-authored-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31885>
Will make it easier to adapt to gen8 changes.
Note that with the change we no longer emit RB_VIEWPORT_ZCLAMP_MIN/MAX
if viewport_count==0. But this should not be a valid case, so no
functional change is expected.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39052>
This broke a number of dEQP tests, for reasons that are unclear -- in
looking at cffdumps, the const setup looks appropriate, just changing from
e.g:
VS_CTRL_REG1(CONSTLEN=2)
FS_CTRL_REG1(CONSTLEN=1)
LOAD_STATE6(VERT, DIRECT, DST_OFF=2, NUM_UNIT=2)
LOAD_STATE6(FRAG, DIRECT, DST_OFF=0, NUM_UNIT=2)
to:
VS_CTRL_REG1(CONSTLEN=2)
FS_CTRL_REG1(CONSTLEN=1)
LOAD_STATE6(VERT, DIRECT, DST_OFF=2, NUM_UNIT=2)
LOAD_STATE6(FRAG, INDIRECT, DST_OFF=0, NUM_UNIT=2)
Fixes: 203ac73374 ("gallium: set prefer_real_buffer_in_constbuf0 for all drivers using tc")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39070>
Drop the reg_config table trickery, which doesn't play nicely with
register changes across generations. (Note, some of the registers,
like PC_HS_CNTL, are not in fact the same as other shader stages.)
While at it, convert to crb to simplify copying code from the gallium
driver.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
Rework it to take all active/enabled shader stages in one shot, to
simplify things and drop the xs_configs table.
This lets us use the variant reg packers directly to better deal with
register changes across generations.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
Drop the legacy register offset macros. And re-work how we map the vk
query to pipeline stat offset, to account for re-ordering of the
pipeline stat regs in gen8.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
These were only used in one place. Drop them and convert to using the
new register packers for improved cross-gen portability.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
The non-variant reg packers were removed in commit fd6489c026 ("tu:
Drop emitting of deprecated packing."), along with
FD_NO_DEPRECATED_PACK.
Add support to mark the even older reg builders as deprecated, and
re-introduce FD_NO_DEPRECATED_PACK to control this.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
If the builder is passed just a raw value, like an iova in the case of
turnip, we probably don't want to assert that it is all unknown bits.
This hasn't been a problem for the gallium driver, as the bo pointer is
first. But became a problem for turnip with commit 2d6c15ad57 ("tu:
remove magic bo reg packing (use iovas directly)").
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39029>
sam.s2en uses the first src for its samp/tex while the coordinates (for
which derivatives need to be calculated) are in the second src. We used
to unconditionally track needed helpers for the first src causing (eq)
to potentially get scheduled too early for sam.s2en. Fix this by using
the second src for sam.s2en.
Fixes frame instability in Metro Exodus.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 29f8277952 ("ir3/legalize: schedule (eq) more accurately")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38992>
Previously, we would spill at the NIR level any temp array over 16 vec4s.
This had two problems:
1) We wouldn't spill for the worst case scenario: a MAD accessing a dst
array and 3 different src arrays (that all get fully unspilled, rather
than just reloading the specific reg in the operand). This would fail to
register allocate. We haven't seen this in practice.
2) We would spill vec4[17] and larger arrays that weren't necessary to get
the shader to register allocate. This occurred on a FS for in Stray that
had a vec4[24] array and just 4 vec4s of register pressure other than the
array.
Instead, use NIR scratch spilling when the worst case set of vars to
reference in an instruction would overflow GPR space. This makes the
shader in Stray go from 11ms to .5ms, by eliminating all spilling and
leaving the array in GPRs. On the other hand, if leaving the arrays
unspilled in NIR means that we cause spilling in ir3, the fact that ir3
spills/reloads work on the whole array may cause the amount of spilling to
increase. However, we can see the effect is very small in terms of number
of shaders affected in shader-db and an overwhelmingly positive effect on
spills:
MaxWaves: 22522470 -> 22520664 (-0.01%)
Instrs: 396093281 -> 396122221 (+0.01%); split: -0.00%, +0.01%
STPs: 218915 -> 182907 (-16.45%)
LDPs: 155374 -> 153364 (-1.29%); split: -2.79%, +1.50%
Totals from 496 (0.03% of 1561298) affected shaders:
MaxWaves: 3792 -> 1986 (-47.63%)
Instrs: 441224 -> 470164 (+6.56%); split: -0.00%, +6.57%
CodeSize: 926164 -> 976734 (+5.46%); split: -0.05%, +5.52%
NOPs: 58896 -> 52765 (-10.41%); split: -14.95%, +4.60%
MOVs: 16314 -> 57901 (+254.92%)
COVs: 3293 -> 5146 (+56.27%)
Full: 12876 -> 23632 (+83.54%)
(ss): 18613 -> 11573 (-37.82%); split: -47.53%, +9.71%
(sy): 2539 -> 2505 (-1.34%); split: -10.75%, +9.41%
(ss)-stall: 40682 -> 26413 (-35.07%); split: -47.90%, +12.80%
(sy)-stall: 147862 -> 117004 (-20.87%); split: -37.65%, +16.69%
STPs: 38566 -> 2558 (-93.37%)
LDPs: 5060 -> 3050 (-39.72%); split: -85.77%, +45.93%
Cat0: 65593 -> 59487 (-9.31%); split: -13.42%, +4.15%
Cat1: 19667 -> 63105 (+220.87%)
Cat2: 155958 -> 157879 (+1.23%); split: -0.05%, +1.28%
Cat6: 105228 -> 94910 (-9.81%); split: -12.36%, +2.54%
Cat7: 2480 -> 2485 (+0.20%); split: -0.08%, +0.28%
Subgroup size: 31872 -> 31744 (-0.40%)
The primary impacted application from shader-db is gfxbench aztec ruins.
A quick test of it showed no significant performance improvement (n=3).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37245>
Now that qcom has released the gpu firmware for the a750, let's stop
using my fw package in favor of the publicly-available ones.
v2:
* Be more specific in the list of files we want to keep (lumag)
* Uprev the linux firmware version
* Use gfx-ci/firmware rather than the upstream gitlab repo
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38977>