The FD being waited on in `kgsl_syncobj_wait` was the device FD
instead of the FD of the syncpoint, this was entirely incorrect
and would result in waiting on FD kgsl syncobjs being entirely
broken.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27267>
VK_ACCESS_2_SHADER_STORAGE_READ_BIT specifies read access to a
storage buffer, physical storage buffer, storage texel buffer, or
storage image in any shader pipeline stage.
Any storage buffers or images written to must be invalidated and
flushed before the shader can access them.
This fixes the following tests on LNL:
- dEQP-VK.synchronization2.op.single_queue.barrier.write\*_specialized_access_flag
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27212>
EGL_EXT_create_context_robustness provides separate knobs for device
reset strategy and robust buffer access. As there is no separate query
for both piecies of functionality, devices which do not support robust
buffer access need to reject contexts created with that flag with
EGL_BAD_CONFIG.
Given that EGL can't do cap queries, we create a fake extension entry in
the EGLDisplay to cover whether the device can do robust buffer access
or just device-reset queries.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26972>
While enumerating devices on a system with multiple implementations,
unnecessary ioctls will be issued before a driver checks if it supports a
given device.
This patch makes the driver fail early based on a intel_device_info.ver
check with 2 new parameters added to intel_get_device_info_from_fd.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27166>
This adds a few new fields in the brw_cs_prog_data struct and then
uses them to fill in the relevant COMPUTE_WALKER fields.
Although the Tile Layout field theoretically has different settings for
32/64/128bpe, it appears that the recommended programming is to always
pick either TileY 32bpe or Linear. It's not very practical to look at
the surface formats involved, anyway.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27167>
We'll want to check for Alchemist and set various prog_data fields
in the next patch, in order to enable some optimizations. Passing
NULL for prog_data will remain valid and continue working as before.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27167>
This has been stubbed since 2019, but wasn't advertised or implemented.
Neither kernel offers us any interface for tracking evictions, but we
can at least report the size of various heaps.
Fixes various SPECviewperf subtests on Alchemist.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27168>
Our actual timestamp register from the hardware is 36-bits these days.
Depending on the clock base, a few of the low bits may be discarded.
We scale by 52.08 or so (see intel_device_info_timebase_scale()) in
order to convert clock ticks to nanoseconds. This value could take
~43-44 bits to represent. When our timestamp register overflows, the
reported value would overflow at some value which isn't a power-of-two.
For some reason, when we implemented ARB_timer_query for i965 in 2012,
we thought applications would use GL_QUERY_COUNTER_BITS to detect and
handle overflow, expecting our timer queries to overflow at a power-of-
two value. We tried to hack around this by reporting a 36-bit counter,
for what was then a 32-bit(?) timestamp register, with a timebase scale
of 80(?)...and reported the nanoseconds modulo 2^36, with some
handwaving about wrap around times being close enough.
I don't think this is what anyone wants. All other Gallium drivers
(except maybe zink) report 64 here, as does the Intel Windows driver.
The ARB_timer_query spec defines it as:
"If <pname> is QUERY_COUNTER_BITS, the implementation-dependent
number of bits used to hold the query result for <target> will be
placed in <params>. The number of query counter bits may be zero,
in which case the counter contains no useful information."
and it also mentions about overflow:
"If the elapsed time overflows the number of bits, <n>, available to
hold elapsed time, its value becomes undefined. It is recommended,
but not required, that implementations handle this overflow case by
saturating at 2^n - 1."
There's nothing about roll-over happening at power-of-two times, just
that the value returned has to fit in that many bits, and if the value
were to exceed that, it's undefined, optionally saturated to the maximum
representable value.
This patch makes us report 64 like other drivers, and stop taking the
modulus. Technically, our roll-over will happen before we reach the
number of query counter bits, which may or may not be valid. But this
does let us report longer times, and should be more desirable behavior.
Tested-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26058>
num_syncs was being incremented by one if 'utrace_submit != NULL' but
a sync was only being set if also
'util_dynarray_num_elements(&utrace_submit->batch_bos) == 0'.
This mismatch could cause application to abort due to
'assert(count == num_syncs)'.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27244>
This causes lavapipe to use the split code and fixes accuracy
for CTS.
Fixes dEQP-VK.glsl.builtin.precision_fconvert.f64_to_f16*
Cc: mesa-stable
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27228>
With regards to implicit masking of the shift counts for 8- and 16-bit
types, the PRMs are incorrect. They falsely state that on Gen9+ only the
low bits of src1 matching the size of src0 (e.g., 4-bits for W or UW
src0) are used. The Bspec (backed by data from experimentation) state
that 0x3f is used for Q and UQ types, and 0x1f is used for **all** other
types.
To match the behavior expected for the NIR opcodes, explicit masks for
8- and 16-bit types must be added.
This fixes (the updated version, see crucible!138) of
func.shader.shift.int16_t on all Intel platforms. According to Karol,
this also fixes "integer_ops integer_rotate" tests in OpenCL CTS.
No shader-db or fossil-db changes on any Intel platform.
Tested-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23001>