This was originally disabled by a22ad99bdd ("pvr: set device
features/props/extensions to Vulkan 1.0 minimums (unless implemented)") in order
to concentrate efforts on passing "base" Vulkan conformance before layering on
additional functionality. The driver is now Vulkan 1.2 conformant.
As the functionality is already implemented, simply enable the extension.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Ella Stanforth <ella@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41859>
The Vulkan spec allows this to be used in a mesh shader as long as
it's not accessed, so it can be eliminated.
This fixes dEQP-VK.mesh_shader.ext.misc.payload_not_accessed.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41828>
This adds an F16 struct which provides a 16-bit float type using Mesa's
existing half-precision support internally. Right now, it only contains
the basics but it could be expanded if needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41375>
genX(batch_emit_vertex_input) reserves 3DSTATE_VERTEX_ELEMENTS and then
writes into that reserved memory. Any later anv_batch_emit() may
allocate a new batch and finalize the previous one, running the valgrind
defined-memory check over it.
Fill the draw-parameter and dynamic VERTEX_ELEMENT_STATE entries before
emitting 3DSTATE_VF_INSTANCING.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41790>
Linux eventfds contain a 64-bit value which can be increased by arbitrary
numbers, and waiting returns a numeric value that consumers might need
to actually read.
Also, reading/waiting does mutate kernel state, so make it &mut self
like reading on std::fs::File is.
Signed-off-by: Val Packett <val@invisiblethingslab.com>
Signed-off-by: Gurchetan Singh <gurchetan.singh.foss@gmail.com>
Reviewed-by: David Gilhooley <djgilhooley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41754>
Apparently GRAS_CL_INTERP_CNTL has two fields FACENESS and CENTERRHW
which allows us to not enable IJ_LINEAR_PIXEL input, which can
improve performance in trivial cases by ~50%.
Mirrors Turnip change.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41848>
Apparently GRAS_CL_INTERP_CNTL has two fields FACENESS and CENTERRHW
which allows us to not enable IJ_LINEAR_PIXEL input, which can
improve performance in trivial cases by ~50%.
Found via gpu-ratemeter bench: vk.pix.noaa.1flat.face
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41848>
Coverity notices that there is an error case where
`nir_get_io_data_src_number` could return `-1`, and that is then used to
index into an array. Given that that is an exceptional case, we can just
assert here.
CID: 1681480
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40146>
On Xe2 and Xe3, the flushing is necessary due to aliasing of TGM data
in L1 memory (HSD 14020414266). On newer platforms, it is necessary
for proper post-format data conversion handling (HSD 22020984324).
See the Instruction_Fence page (63969) for documentation on the fact
that the threadgroup scope ignores flushes.
Thanks to Francisco Jerez and Kenneth Graunke on their help for this
patch.
v2: restrict the flushing to TGM (Lionel).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40732>
This register seems to be fairly critical on A7XX for vertex processing
performance, and was set to an unoptimal value for the A730/A735/A740
which has now been updated to a value that maximizes performance and
aligns with the proprietary driver.
Fixes#15411
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41451>
When reloading live-ins, child intervals need to be extracted to ensure
we can add live-in phi nodes for them.
Fixes asserts with spillall for a bunch of ray_query and
ray_tracing_pipeline CTS tests:
src/freedreno/ir3/ir3_spill.c: add_live_in_phi: Assertion `entry' failed.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41756>
tu6_build_depth_plane_z_mode has a dependency on
occlusion_query_may_be_running.
Fixes: 8f5d433840 ("tu: Occlusion query counting should happen after FS that kills")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41856>
Some values were wrong, so here adding the whole table with all fixed values.
Just to make easier to read and compare I have added all shader stages to
XEHP_URB_MIN_MAX_ENTRIES.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41789>
Right now this value is not use but it will in the next patch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41789>
src[1] or src[2] would mean that the atomic uses the deref as data for the
op, we only want to allow address source uses.
Fixes: bb311ce370 ("nir: Allow atomics as non-complex uses for var-splitting passes")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41818>
These formats are not supported natively on gfx20+. However, with a
driconf option enabled, we do create surfaces with these formats and use
them for transfer and decompression operations. Provide a CMF for these
formats to avoid hitting the unreachable in
isl_get_render_compression_format().
Fixes: 27d515772e ("intel/isl: Replace mc_format with aux_format")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15547
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41830>
Deduplicating the winsys just for budget looks more like a hack than
a real implementation. Reworking tracking allocated memory to remove
the dedup.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41805>
As long as we round up the /alignments/ in RA, and pad to power-of-two when
calculating partitions (trivially true now, this informs future work though),
this is fine.
SIMD16:
Totals from 1001 (37.82% of 2647) affected shaders:
Instrs: 1897734 -> 1896157 (-0.08%); split: -0.25%, +0.16%
CodeSize: 28330256 -> 28315472 (-0.05%); split: -0.30%, +0.25%
Number of spill instructions: 1003 -> 999 (-0.40%)
Number of fill instructions: 990 -> 986 (-0.40%)
SIMD32:
Totals from 1230 (46.47% of 2647) affected shaders:
Instrs: 3284649 -> 3277437 (-0.22%); split: -1.18%, +0.96%
CodeSize: 48977696 -> 48907376 (-0.14%); split: -1.10%, +0.96%
Number of spill instructions: 41004 -> 40582 (-1.03%); split: -1.05%, +0.02%
Number of fill instructions: 39298 -> 38572 (-1.85%); split: -1.91%, +0.06%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>
Jay's novel SSA-based register allocator relies on a fixed partition of Intel
GRFs mapping to logical GPRs.
Previously, Jay used a simple partitioning scheme, which was good enough for
simple compute and fragment shaders, but has both limitations preventing new
feature bring-up and performance issues.
Here we rewrite the Jay partitioning code at the heart of the Jay RA in
order to lift these restrictions and allow fully flexible partitions. This
should be easier to reason about, fix a bunch of issues around simd32 payloads,
enable better performance, etc.
The # of stride 16 GRFs reserved is halved in simd32 mode here to match how
multisampling stuff works, which explains the large simd32-only instruction
count reduction.
While churning all this code, I took the opportunity to break off
jay_partition.c... I think that is better organized and the diff was garbage
otherwise.
SIMD16:
Totals from 2189 (82.70% of 2647) affected shaders:
Instrs: 2702159 -> 2670951 (-1.15%); split: -1.41%, +0.26%
CodeSize: 40296128 -> 39850304 (-1.11%); split: -1.40%, +0.30%
SIMD32:
Totals from 2373 (89.65% of 2647) affected shaders:
Instrs: 4559418 -> 4072897 (-10.67%); split: -10.77%, +0.10%
CodeSize: 68185488 -> 60635616 (-11.07%); split: -11.17%, +0.09%
Number of spill instructions: 44069 -> 44055 (-0.03%)
Number of fill instructions: 43292 -> 43278 (-0.03%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41808>