By using the constant path we can combine the v_and and the v_cmp.
Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 2833 -> 2831 (-0.07%)
Latency: 27385 -> 27367 (-0.07%)
InvThroughput: 1712 -> 1710 (-0.12%)
VALU: 1301 -> 1299 (-0.15%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705>
The pvr_clear_vdm_state_get_size_in_dw() wrongly think instance count
inputs are needed when doing RTA clear for cores without the
gs_rta_support feature. However, the instance ID is exploited to output
the target layer ID, which isn't supported at all for cores w/o that
feature, so it looks that the condition is inverted. In addition, the
pvr_pack_clear_vdm_state() function seems to have similar logic deciding
whether to emit instance_count, and the logic is opposite to the logic
in pvr_clear_vdm_state_get_size_in_dw() for the part checking the
gs_rta_support feature.
Invert the condition to take instance ID inputs for cores with the
gs_rta_support feature instead of those without this feature.
Fixes: b59eb30e88 ("pvr: Fix cs corruption in pvr_pack_clear_vdm_state()")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40831>
Those were relevent for Fermi or just the Gallium driver.
For the vertex runout, it is implemented a bit after
(SET_VERTEX_STREAM_SUBSTITUTE_A)
I also rewrote the comment about CSAA_ENABLE as it is still relevent.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver set 4 for UNORM8 and SRGB8, let's match this.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
NVIDIA proprietary driver does that, we were missing this and possibly
making the VAF (Vertex Attribute Fetch) unit evict the first entry
instead if nothing was setting it.
The golden ctx already set it for us at least on Ada but for consistency
let's make sure it's set here in case this is different on other
generations.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
There is no reasons to cut 48KiB of memory out of L1 cache on gfx
considering that we do not have shared memory and that local
memory does not need to be directly addressable.
This is not set by NVIDIA proprietary driver and the golden ctx setup keep it
uninitialized.
Unsure if that will change anything in term of performance but it might reduce
L1 cache usage on 3D.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
We are already doing this in nvk_push_draw_state_init there is no need
for the extra DMA fill.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40753>
This makes printing of BIR in SSA-form more similar to NIR and after
register allocation, it shows consecutive registers for operands
reading/writing to more than one register.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40711>
If we can't determine what variable is accessed, we should assume it could
be any.
This might make things worse with Vulkan since it does
vulkan_resource_index+load_vulkan_descriptor, but I don't think it matters
much. SSBO stores are rarely used in vertex shaders.
fossil-db (navi21):
Totals from 1 (0.00% of 202427) affected shaders:
Instrs: 442 -> 445 (+0.68%)
Latency: 2038 -> 2043 (+0.25%)
InvThroughput: 432 -> 437 (+1.16%)
VALU: 295 -> 298 (+1.02%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
runtime error: left shift of 65535 by 16 places cannot be represented in type 'int'
This fixes nir_opt_algebraic_pattern_test.bf2f.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: ecd2d2cf46 ("util: Add functions to convert float to/from bfloat16")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
The nil Rust test fails to link when built under Coverity (cov-build):
/usr/bin/ld: src/util/libmesa_util.a.p/format_u_format_other.c.o:
undefined reference to symbol 'sqrtf@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libm.so.6:
error adding symbols: DSO missing from command line
This does not reproduce with plain GCC or Clang builds.
When rustc invokes the linker for the nil test binary, the generated
link command is structured as:
cc ... [Rust rlibs] -Bdynamic -lm -ldl -lc ...
-fuse-ld=lld -B.../gcc-ld ...
[static archives: libmesa_util.a ...]
The -lm appears before libmesa_util.a in both Coverity and non-Coverity
builds. With --as-needed enabled, the linker only records a shared
library as needed if it resolves an undefined symbol at the point it
is encountered. Since no symbols need -lm when it is first seen, the
outcome depends on the linker implementation:
- lld (rustc's bundled linker, used in plain builds): Tolerates
back-references from later static archives to earlier shared
libraries, so libmesa_util.a's sqrtf reference is still resolved
by the previously-seen libm.so.
- ld.bfd (GNU ld): Strict single-pass left-to-right. Once -lm is
skipped by --as-needed, it cannot satisfy sqrtf when libmesa_util.a
is processed later.
Coverity's cov-build wrapper intercepts rustc's call to the linker
and strips the -fuse-ld=lld and -B.../gcc-ld arguments, causing the
linker to fall back to the system's ld.bfd. This exposes the latent
link-order problem that lld was masking.
The underlying issue is that rustc places default libraries (-lm, -lc,
etc.) before user-specified static archives in the link command, which
is a known rustc limitation.
See also: https://github.com/rust-lang/rust/issues/154975
Fix this by passing -lm via rust_args with --no-as-needed brackets.
This forces ld.bfd to record libm as needed regardless of when it
appears on the command line, so sqrtf from libmesa_util.a is resolved
correctly under both lld and ld.bfd.
Fixes: 0920e0afb5 ("nil: Add zcull support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40793>
Without this, reg_offset will return 1024 for acc0. This causes
has_invalid_dst_region to decide that the destination region is invalid
(because 1024 != 0), and the lowering code tries to treat the floating
point accumulators as integers. It's a mess.
v2: Add and use set_gfx_platform. Suggested by Caio.
Fixes: 937373eb25 ("i965/fs: Handle fixed HW GRF subnr in reg_offset().")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40716>
... to ensure that "mSyncHelper->close()" still happens on
non goldfish devices.
Android equivalent ag/39315505 for b/500332164
Tested with CtsGraphicsTestCases after the build file changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40836>
On Ampere B and later, we can specify the prefetch size in blocks of a
gfx shader we are binding.
NVIDIA proprietary driver always set it with the
max size possible. (up to 127 blocks)
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40700>
We are going to need the total shader size (without embedded data),
let's move this out of the upload codepath.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Tested-by: Thomas H.P. Andersen <phomes@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40700>
Integer accumulators and float accumulators do not occupy the same bits,
so the types cannot be arbitrarily changed.
No shader-db or fossil-db changes on any Intel platform.
v2: Use is_accumulator() instead if brw_reg_is_arf(). Add an extra test
to show the desired behavior when an accumulator is not
involved. Suggested by Caio.
Fixes: 64c251bb3a ("intel/fs: Combine constants for SEL instructions too")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40638>
Remove the unused vishandle pointer and rely solely on visualid-based
matching. This also eliminates the leak.
This mirrors the cleanup previously done in fakeglx.c. (781232e0ac)
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40422>
After abe6d750e5, glXDestroyContext() can defer destruction by marking
the context with xid == None while it is still current.
However, the release-current path did not clear current->currentDpy,
so a context that had already been marked for deletion could remain
associated with a display after unbinding.
Fixes: abe6d750e5 ("xlib: fix glXDestroyContext in Gallium frontends")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14947
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40422>
AC_TRACKED_DB_PA_SC_VRS_OVERRIDE_CNTL can be used instead because
the DB and PA registers are mutually exclusive.
2 definitions are moved because consecutive enums aren't allowed
to cross a multiple of 32 because of static assertions in the bitset.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40586>
Since there's no predicate, the inverse bit is not relevant, so always
set it to false instead of using whatever was set by the previous
instruction. Hardware already ignores this but will make verifying
later changes easier.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40800>
When per-primitive padding is needed, max_push_buffers is set to 3
(instead of 4) to reserve the last slot for it.
The assert was requiring `n_push_ranges < max_push_buffers`, which
incorrectly fired when the 3 ranges were used.
Fixes: a8ba682919 ("anv: assert we haven't gone over the maximum number of push_buffers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15155
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40803>
this saves a conversion or two.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40829>
On new platforms, it's valid to use a NULL destination in conjunction with a
cmod, where you care about the implicit flag write but you don't need to clobber
any GRF. Something like:
if (x * y > z) {
compiling (with fast-math) to
mad.gt.f0 _, -z, x, y
(f0) if
This patch allows us to emit that instruction.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40829>
This lets us treat it as a packed data structure without worrying about garbage.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40829>
It's required by descriptor heap. There is already a NIR pass that
optimizes non-uniform access, so this should be mostly safe.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40768>
This isn't required for deref instructions because it's possible to
get the image format back from the variable but it will be useful for
descriptor heap.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40768>