While not currently required, it will be for future GPUs.
Also cleans up gpu_id as parameter to some functions that didn't use it.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40610>
The current implementation is a bit awkward and becomes tricky when
adding support for 64 bit gpu_ids.
Rather than keeping a mask of bits in gpu_id to compare with the stored
gpu_prod_id value, rely on macro functions for fetching the information
required from gpu_id and creating the comparison value.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40610>
Rather than having preload registers hardcoded over multiple files,
gather them in one place with an enum abstraction.
This should simplify updates to the preload registers.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40643>
genX_cmd_compute.c has 2 places that is had a code very similar to
anv_shader_get_scratch_surf() but we could not make use of this function without
change it parameters.
Now it takes the shader stage and the total_scratch instead of anv_shader because
cmd_buffer_trace_rays() don't have a shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
We will need to call get_scratch_surf() from other files, so here removing the
static and adding it to anv_private.h.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
We are already at our limit of 31 texture opcodes, and cannot add any
more without expanding the opcode hashing in nir_instr_set. Thankfully,
it's at 29 bits, so adding one here is possible still.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
New on Xe2, this instruction enables faster 32x32 integer multiply at the cost
of extra accumulator usage. Add it to the opcode list for future use.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
A set is large and expensive to iterate.
This is faster (overall fossilize-replay difference):
Difference at 95.0% confidence
-250 +/- 28.9257
-2.04849% +/- 0.235211%
(Student's t, pooled s = 34.1626)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
This would have been fine for a set, but we're going to use a dynarray for
the predecessors in the future.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
If both src and dst are compressed formats, adjusting the extent isn't
necessary because it's required that texel block extent matches. The
previous division was also wrong because it was truncating partial
blocks causing issues in some tests.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40678>
Fix non-functional issue where calls to cs_run_fragment() had swapped
tile_order and enable_tem arguments. Both arguments evaluate to 0.
Hence, no functional change.
Fixes: 53f780ec91 ("panfrost: Remove progress_increment from all CS builders")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40840>
Replace the old partial IR state snapshotting with prebuilt
per-pass IR descriptor buffers and a queue-attached scratch FBD.
1. emit FBD+DBD+RTDs for each IR-pass+layer
2. store it into some side-band GPU buffer that's passed around
through the queue context
3. in the execption handler, copy FBD+DBD+RTDs from the IR desc
buffer to some space that's attached to the queue context and
not the cmdbuf
Fixes: 46f611c9 ("panvk: Also use resolve shaders for Z/S")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Ryan zhang <ryan.zhang@nxp.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40625>
We have 11 new RPL-U Brya Chromebooks in the Collabora lab, allowing the
full VKCTS test suite to run pre-merge for the first time without a
fraction.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40782>
qualityLevel as a speed/quality tradeoff is better match for encode presets
than the tuning mode. In addition we can support more quality levels than
what is available from tuning modes, this adds support for High Quality
preset on VCN4+.
This also makes it possible to use Low Latency tuning with balance and
quality presets.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40765>
By using the constant path we can combine the v_and and the v_cmp.
Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 2833 -> 2831 (-0.07%)
Latency: 27385 -> 27367 (-0.07%)
InvThroughput: 1712 -> 1710 (-0.12%)
VALU: 1301 -> 1299 (-0.15%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705>
The pvr_clear_vdm_state_get_size_in_dw() wrongly think instance count
inputs are needed when doing RTA clear for cores without the
gs_rta_support feature. However, the instance ID is exploited to output
the target layer ID, which isn't supported at all for cores w/o that
feature, so it looks that the condition is inverted. In addition, the
pvr_pack_clear_vdm_state() function seems to have similar logic deciding
whether to emit instance_count, and the logic is opposite to the logic
in pvr_clear_vdm_state_get_size_in_dw() for the part checking the
gs_rta_support feature.
Invert the condition to take instance ID inputs for cores with the
gs_rta_support feature instead of those without this feature.
Fixes: b59eb30e88 ("pvr: Fix cs corruption in pvr_pack_clear_vdm_state()")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40831>
Those were relevent for Fermi or just the Gallium driver.
For the vertex runout, it is implemented a bit after
(SET_VERTEX_STREAM_SUBSTITUTE_A)
I also rewrote the comment about CSAA_ENABLE as it is still relevent.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver set 4 for UNORM8 and SRGB8, let's match this.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver does that, we were missing this and possibly
making the VAF (Vertex Attribute Fetch) unit evict the first entry
instead if nothing was setting it.
The golden ctx already set it for us at least on Ada but for consistency
let's make sure it's set here in case this is different on other
generations.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
There is no reasons to cut 48KiB of memory out of L1 cache on gfx
considering that we do not have shared memory and that local
memory does not need to be directly addressable.
This is not set by NVIDIA proprietary driver and the golden ctx setup keep it
uninitialized.
Unsure if that will change anything in term of performance but it might reduce
L1 cache usage on 3D.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
We are already doing this in nvk_push_draw_state_init there is no need
for the extra DMA fill.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>