genX_cmd_compute.c has 2 places that is had a code very similar to
anv_shader_get_scratch_surf() but we could not make use of this function without
change it parameters.
Now it takes the shader stage and the total_scratch instead of anv_shader because
cmd_buffer_trace_rays() don't have a shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
We will need to call get_scratch_surf() from other files, so here removing the
static and adding it to anv_private.h.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40832>
We are already at our limit of 31 texture opcodes, and cannot add any
more without expanding the opcode hashing in nir_instr_set. Thankfully,
it's at 29 bits, so adding one here is possible still.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
New on Xe2, this instruction enables faster 32x32 integer multiply at the cost
of extra accumulator usage. Add it to the opcode list for future use.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
A set is large and expensive to iterate.
This is faster (overall fossilize-replay difference):
Difference at 95.0% confidence
-250 +/- 28.9257
-2.04849% +/- 0.235211%
(Student's t, pooled s = 34.1626)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
This would have been fine for a set, but we're going to use a dynarray for
the predecessors in the future.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
If both src and dst are compressed formats, adjusting the extent isn't
necessary because it's required that texel block extent matches. The
previous division was also wrong because it was truncating partial
blocks causing issues in some tests.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40678>
Fix non-functional issue where calls to cs_run_fragment() had swapped
tile_order and enable_tem arguments. Both arguments evaluate to 0.
Hence, no functional change.
Fixes: 53f780ec91 ("panfrost: Remove progress_increment from all CS builders")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40840>
Replace the old partial IR state snapshotting with prebuilt
per-pass IR descriptor buffers and a queue-attached scratch FBD.
1. emit FBD+DBD+RTDs for each IR-pass+layer
2. store it into some side-band GPU buffer that's passed around
through the queue context
3. in the execption handler, copy FBD+DBD+RTDs from the IR desc
buffer to some space that's attached to the queue context and
not the cmdbuf
Fixes: 46f611c9 ("panvk: Also use resolve shaders for Z/S")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Ryan zhang <ryan.zhang@nxp.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40625>
We have 11 new RPL-U Brya Chromebooks in the Collabora lab, allowing the
full VKCTS test suite to run pre-merge for the first time without a
fraction.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40782>
qualityLevel as a speed/quality tradeoff is better match for encode presets
than the tuning mode. In addition we can support more quality levels than
what is available from tuning modes, this adds support for High Quality
preset on VCN4+.
This also makes it possible to use Low Latency tuning with balance and
quality presets.
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40765>
By using the constant path we can combine the v_and and the v_cmp.
Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 2833 -> 2831 (-0.07%)
Latency: 27385 -> 27367 (-0.07%)
InvThroughput: 1712 -> 1710 (-0.12%)
VALU: 1301 -> 1299 (-0.15%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705>
The pvr_clear_vdm_state_get_size_in_dw() wrongly think instance count
inputs are needed when doing RTA clear for cores without the
gs_rta_support feature. However, the instance ID is exploited to output
the target layer ID, which isn't supported at all for cores w/o that
feature, so it looks that the condition is inverted. In addition, the
pvr_pack_clear_vdm_state() function seems to have similar logic deciding
whether to emit instance_count, and the logic is opposite to the logic
in pvr_clear_vdm_state_get_size_in_dw() for the part checking the
gs_rta_support feature.
Invert the condition to take instance ID inputs for cores with the
gs_rta_support feature instead of those without this feature.
Fixes: b59eb30e88 ("pvr: Fix cs corruption in pvr_pack_clear_vdm_state()")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40831>
Those were relevent for Fermi or just the Gallium driver.
For the vertex runout, it is implemented a bit after
(SET_VERTEX_STREAM_SUBSTITUTE_A)
I also rewrote the comment about CSAA_ENABLE as it is still relevent.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver set 4 for UNORM8 and SRGB8, let's match this.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver does that, we were missing this and possibly
making the VAF (Vertex Attribute Fetch) unit evict the first entry
instead if nothing was setting it.
The golden ctx already set it for us at least on Ada but for consistency
let's make sure it's set here in case this is different on other
generations.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
There is no reasons to cut 48KiB of memory out of L1 cache on gfx
considering that we do not have shared memory and that local
memory does not need to be directly addressable.
This is not set by NVIDIA proprietary driver and the golden ctx setup keep it
uninitialized.
Unsure if that will change anything in term of performance but it might reduce
L1 cache usage on 3D.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
We are already doing this in nvk_push_draw_state_init there is no need
for the extra DMA fill.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
This makes printing of BIR in SSA-form more similar to NIR and after
register allocation, it shows consecutive registers for operands
reading/writing to more than one register.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40711>
If we can't determine what variable is accessed, we should assume it could
be any.
This might make things worse with Vulkan since it does
vulkan_resource_index+load_vulkan_descriptor, but I don't think it matters
much. SSBO stores are rarely used in vertex shaders.
fossil-db (navi21):
Totals from 1 (0.00% of 202427) affected shaders:
Instrs: 442 -> 445 (+0.68%)
Latency: 2038 -> 2043 (+0.25%)
InvThroughput: 432 -> 437 (+1.16%)
VALU: 295 -> 298 (+1.02%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
runtime error: left shift of 65535 by 16 places cannot be represented in type 'int'
This fixes nir_opt_algebraic_pattern_test.bf2f.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: ecd2d2cf46 ("util: Add functions to convert float to/from bfloat16")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
The nil Rust test fails to link when built under Coverity (cov-build):
/usr/bin/ld: src/util/libmesa_util.a.p/format_u_format_other.c.o:
undefined reference to symbol 'sqrtf@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libm.so.6:
error adding symbols: DSO missing from command line
This does not reproduce with plain GCC or Clang builds.
When rustc invokes the linker for the nil test binary, the generated
link command is structured as:
cc ... [Rust rlibs] -Bdynamic -lm -ldl -lc ...
-fuse-ld=lld -B.../gcc-ld ...
[static archives: libmesa_util.a ...]
The -lm appears before libmesa_util.a in both Coverity and non-Coverity
builds. With --as-needed enabled, the linker only records a shared
library as needed if it resolves an undefined symbol at the point it
is encountered. Since no symbols need -lm when it is first seen, the
outcome depends on the linker implementation:
- lld (rustc's bundled linker, used in plain builds): Tolerates
back-references from later static archives to earlier shared
libraries, so libmesa_util.a's sqrtf reference is still resolved
by the previously-seen libm.so.
- ld.bfd (GNU ld): Strict single-pass left-to-right. Once -lm is
skipped by --as-needed, it cannot satisfy sqrtf when libmesa_util.a
is processed later.
Coverity's cov-build wrapper intercepts rustc's call to the linker
and strips the -fuse-ld=lld and -B.../gcc-ld arguments, causing the
linker to fall back to the system's ld.bfd. This exposes the latent
link-order problem that lld was masking.
The underlying issue is that rustc places default libraries (-lm, -lc,
etc.) before user-specified static archives in the link command, which
is a known rustc limitation.
See also: https://github.com/rust-lang/rust/issues/154975
Fix this by passing -lm via rust_args with --no-as-needed brackets.
This forces ld.bfd to record libm as needed regardless of when it
appears on the command line, so sqrtf from libmesa_util.a is resolved
correctly under both lld and ld.bfd.
Fixes: 0920e0afb5 ("nil: Add zcull support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40793>
Without this, reg_offset will return 1024 for acc0. This causes
has_invalid_dst_region to decide that the destination region is invalid
(because 1024 != 0), and the lowering code tries to treat the floating
point accumulators as integers. It's a mess.
v2: Add and use set_gfx_platform. Suggested by Caio.
Fixes: 937373eb25 ("i965/fs: Handle fixed HW GRF subnr in reg_offset().")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40716>