This is a generalization of NAK's SSAValue and SSAValueArray structs.
But instead of depending on NAK's bespoke invariants, this depends on
something far simpler: A lower bound on the u32. As long as you can
guarantee that the maximum array length is strictly less than the
minimum U32 value, we can pull the same trick as NAK and generalize it
into a LowerBoundedU32Array type.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Coverity notices that there is an error case where
`nir_get_io_data_src_number` could return `-1`, and that is then used to
index into an array. Given that that is an exceptional case, we can just
assert here.
CID: 1681480
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40146>
On Xe2 and Xe3, the flushing is necessary due to aliasing of TGM data
in L1 memory (HSD 14020414266). On newer platforms, it is necessary
for proper post-format data conversion handling (HSD 22020984324).
See the Instruction_Fence page (63969) for documentation on the fact
that the threadgroup scope ignores flushes.
Thanks to Francisco Jerez and Kenneth Graunke on their help for this
patch.
v2: restrict the flushing to TGM (Lionel).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40732>
This register seems to be fairly critical on A7XX for vertex processing
performance, and was set to an unoptimal value for the A730/A735/A740
which has now been updated to a value that maximizes performance and
aligns with the proprietary driver.
Fixes#15411
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41451>
When reloading live-ins, child intervals need to be extracted to ensure
we can add live-in phi nodes for them.
Fixes asserts with spillall for a bunch of ray_query and
ray_tracing_pipeline CTS tests:
src/freedreno/ir3/ir3_spill.c: add_live_in_phi: Assertion `entry' failed.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41756>
tu6_build_depth_plane_z_mode has a dependency on
occlusion_query_may_be_running.
Fixes: 8f5d433840 ("tu: Occlusion query counting should happen after FS that kills")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41856>
Some values were wrong, so here adding the whole table with all fixed values.
Just to make easier to read and compare I have added all shader stages to
XEHP_URB_MIN_MAX_ENTRIES.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41789>
Right now this value is not use but it will in the next patch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41789>
src[1] or src[2] would mean that the atomic uses the deref as data for the
op, we only want to allow address source uses.
Fixes: bb311ce370 ("nir: Allow atomics as non-complex uses for var-splitting passes")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41818>
These formats are not supported natively on gfx20+. However, with a
driconf option enabled, we do create surfaces with these formats and use
them for transfer and decompression operations. Provide a CMF for these
formats to avoid hitting the unreachable in
isl_get_render_compression_format().
Fixes: 27d515772e ("intel/isl: Replace mc_format with aux_format")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15547
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41830>
Deduplicating the winsys just for budget looks more like a hack than
a real implementation. Reworking tracking allocated memory to remove
the dedup.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41805>
As long as we round up the /alignments/ in RA, and pad to power-of-two when
calculating partitions (trivially true now, this informs future work though),
this is fine.
SIMD16:
Totals from 1001 (37.82% of 2647) affected shaders:
Instrs: 1897734 -> 1896157 (-0.08%); split: -0.25%, +0.16%
CodeSize: 28330256 -> 28315472 (-0.05%); split: -0.30%, +0.25%
Number of spill instructions: 1003 -> 999 (-0.40%)
Number of fill instructions: 990 -> 986 (-0.40%)
SIMD32:
Totals from 1230 (46.47% of 2647) affected shaders:
Instrs: 3284649 -> 3277437 (-0.22%); split: -1.18%, +0.96%
CodeSize: 48977696 -> 48907376 (-0.14%); split: -1.10%, +0.96%
Number of spill instructions: 41004 -> 40582 (-1.03%); split: -1.05%, +0.02%
Number of fill instructions: 39298 -> 38572 (-1.85%); split: -1.91%, +0.06%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
Jay's novel SSA-based register allocator relies on a fixed partition of Intel
GRFs mapping to logical GPRs.
Previously, Jay used a simple partitioning scheme, which was good enough for
simple compute and fragment shaders, but has both limitations preventing new
feature bring-up and performance issues.
Here we rewrite the Jay partitioning code at the heart of the Jay RA in
order to lift these restrictions and allow fully flexible partitions. This
should be easier to reason about, fix a bunch of issues around simd32 payloads,
enable better performance, etc.
The # of stride 16 GRFs reserved is halved in simd32 mode here to match how
multisampling stuff works, which explains the large simd32-only instruction
count reduction.
While churning all this code, I took the opportunity to break off
jay_partition.c... I think that is better organized and the diff was garbage
otherwise.
SIMD16:
Totals from 2189 (82.70% of 2647) affected shaders:
Instrs: 2702159 -> 2670951 (-1.15%); split: -1.41%, +0.26%
CodeSize: 40296128 -> 39850304 (-1.11%); split: -1.40%, +0.30%
SIMD32:
Totals from 2373 (89.65% of 2647) affected shaders:
Instrs: 4559418 -> 4072897 (-10.67%); split: -10.77%, +0.10%
CodeSize: 68185488 -> 60635616 (-11.07%); split: -11.17%, +0.09%
Number of spill instructions: 44069 -> 44055 (-0.03%)
Number of fill instructions: 43292 -> 43278 (-0.03%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
Sources may be the wrong file for a payload, and this is what copies
them to the correct one.
For example, a 1D shadow comparison may have a UGPR coordinate but
a GPR shadow comparator. The UGPR needs to be splatted to a full
GPR because the sampler message is divergent.
Unnecessary copies should be easy to propagate away.
Fixes 366 tests in dEQP-VK.texture.shadow.1d.* and
dEQP-VK.pipeline.monolithic.sampler.view_type.1d.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
to complete the xy/z/w fragcoord set for accurate calculations in jay without
introducing a secondary sideband just for this boolean.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
aco uses the GFX9 opcode names, and GFX8 only has the legacy 16bit fma
that zeros the high register half.
Fixes: 570bfe1ee0 ("ac: handle new float multadd opcodes")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41837>
With a few fixes applied, present_id2/wait2 extensions disabled (these
two are not recognized by Vulkan CTS 1.4.3.x) and a fix backported to
the Vulkan CTS, the driver can now pass Vulkan CTS 1.4.3.3.
Bump the conformance version to that value.
The submission link and conformant Vulkan version information in the
current PowerVR driver-specific document is also updated.
Link: https://www.khronos.org/conformance/adopters/conformant-products/vulkan#submission_981
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41786>