src[1] or src[2] would mean that the atomic uses the deref as data for the
op, we only want to allow address source uses.
Fixes: bb311ce370 ("nir: Allow atomics as non-complex uses for var-splitting passes")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41818>
These formats are not supported natively on gfx20+. However, with a
driconf option enabled, we do create surfaces with these formats and use
them for transfer and decompression operations. Provide a CMF for these
formats to avoid hitting the unreachable in
isl_get_render_compression_format().
Fixes: 27d515772e ("intel/isl: Replace mc_format with aux_format")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15547
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41830>
Deduplicating the winsys just for budget looks more like a hack than
a real implementation. Reworking tracking allocated memory to remove
the dedup.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41805>
As long as we round up the /alignments/ in RA, and pad to power-of-two when
calculating partitions (trivially true now, this informs future work though),
this is fine.
SIMD16:
Totals from 1001 (37.82% of 2647) affected shaders:
Instrs: 1897734 -> 1896157 (-0.08%); split: -0.25%, +0.16%
CodeSize: 28330256 -> 28315472 (-0.05%); split: -0.30%, +0.25%
Number of spill instructions: 1003 -> 999 (-0.40%)
Number of fill instructions: 990 -> 986 (-0.40%)
SIMD32:
Totals from 1230 (46.47% of 2647) affected shaders:
Instrs: 3284649 -> 3277437 (-0.22%); split: -1.18%, +0.96%
CodeSize: 48977696 -> 48907376 (-0.14%); split: -1.10%, +0.96%
Number of spill instructions: 41004 -> 40582 (-1.03%); split: -1.05%, +0.02%
Number of fill instructions: 39298 -> 38572 (-1.85%); split: -1.91%, +0.06%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
Jay's novel SSA-based register allocator relies on a fixed partition of Intel
GRFs mapping to logical GPRs.
Previously, Jay used a simple partitioning scheme, which was good enough for
simple compute and fragment shaders, but has both limitations preventing new
feature bring-up and performance issues.
Here we rewrite the Jay partitioning code at the heart of the Jay RA in
order to lift these restrictions and allow fully flexible partitions. This
should be easier to reason about, fix a bunch of issues around simd32 payloads,
enable better performance, etc.
The # of stride 16 GRFs reserved is halved in simd32 mode here to match how
multisampling stuff works, which explains the large simd32-only instruction
count reduction.
While churning all this code, I took the opportunity to break off
jay_partition.c... I think that is better organized and the diff was garbage
otherwise.
SIMD16:
Totals from 2189 (82.70% of 2647) affected shaders:
Instrs: 2702159 -> 2670951 (-1.15%); split: -1.41%, +0.26%
CodeSize: 40296128 -> 39850304 (-1.11%); split: -1.40%, +0.30%
SIMD32:
Totals from 2373 (89.65% of 2647) affected shaders:
Instrs: 4559418 -> 4072897 (-10.67%); split: -10.77%, +0.10%
CodeSize: 68185488 -> 60635616 (-11.07%); split: -11.17%, +0.09%
Number of spill instructions: 44069 -> 44055 (-0.03%)
Number of fill instructions: 43292 -> 43278 (-0.03%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
Sources may be the wrong file for a payload, and this is what copies
them to the correct one.
For example, a 1D shadow comparison may have a UGPR coordinate but
a GPR shadow comparator. The UGPR needs to be splatted to a full
GPR because the sampler message is divergent.
Unnecessary copies should be easy to propagate away.
Fixes 366 tests in dEQP-VK.texture.shadow.1d.* and
dEQP-VK.pipeline.monolithic.sampler.view_type.1d.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
to complete the xy/z/w fragcoord set for accurate calculations in jay without
introducing a secondary sideband just for this boolean.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
aco uses the GFX9 opcode names, and GFX8 only has the legacy 16bit fma
that zeros the high register half.
Fixes: 570bfe1ee0 ("ac: handle new float multadd opcodes")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41837>
With a few fixes applied, present_id2/wait2 extensions disabled (these
two are not recognized by Vulkan CTS 1.4.3.x) and a fix backported to
the Vulkan CTS, the driver can now pass Vulkan CTS 1.4.3.3.
Bump the conformance version to that value.
The submission link and conformant Vulkan version information in the
current PowerVR driver-specific document is also updated.
Link: https://www.khronos.org/conformance/adopters/conformant-products/vulkan#submission_981
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41786>
Vulkan applications use vkQueueSubmit2(submitCount=0) to signal
throttle fences (e.g. per-image frame-pacing fences). When SQTT
is enabled, sqtt_QueueSubmit2() skips both the bypass path and
the submit loop, so the call is never forwarded and the fence
remains unsignaled.
This causes hangs in drmSyncobjWait (WAIT_FOR_SUBMIT) after capture.
Forward submitCount==0 calls directly to the underlying
QueueSubmit2 to ensure the fence is signaled.
Signed-off-by: jyotiranjan <jyotiranjan.bhuyan@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41766>