Follow the implementation of all other Mesa drivers and use the
primary node of the render device for VK_EXT_physical_device_drm.
The topic of which node should be returned here has been the topic
of a long debate, but at least for Mesa drivers, there is the
consensus that this extension should not mix nodes from different
DRM devices. So align v3dv with the other Mesa implementations.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37584>
In this current implementation, primary_fd refers to the display
device, which can be confused with the primary node of the render
device. In a followup we would like to use primary_fd as the primary
node from v3d, prepare for that by renaming it here.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37584>
Fix parsing intra only frames with profile 0. Change type to
signed int and initialize default values for ref_deltas and
mode_deltas.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37808>
Certain subgroup operations don’t impose constraints on
CSD supergroup packing. Mark these as supported
and account for them in v3d_csd_choose_workgroups_per_supergroup()
so packing remains unchanged when they are present.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37836>
Return one work group per super group when the work group size
is multiple of 16 (elements per batch) and recalculate max_wgs_per_sg
only when TSY barriers cut the available QPU threads.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37836>
For startup info logs behind the debug option, logging the file path and
line of code is not quite useful. So use mesa_logi for simplicity.
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37825>
Ensures early algebraic passes aren't called again following late
algebraic passes, so that the latter's opts aren't undone (e.g.
unfusing ffmas).
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>
The rounding behaviour on [iu]2f32 ops needs to be explicitly set in
order to match the implicit behaviour described in the
KHR_shader_float_controls properties.
Fixes: e306abc6e6 ("pvr: implement KHR_shader_float_controls")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>
Switching from compute to 3D and vice versa leads to a long stall which
destroys compute performance. This switches to the compute MME on Ampere
onwards (which was where it was added) for compute dispatches which eliminates
stalling from sub-channel switching in these cases.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
In 03f785083f ("nvk: Reserve MME scratch area for communicating with
FALCON"), we said we reserved these but actually only reserved 0. Only
0 is actually used today but if we're going to claim to reserve
registers we should actually do it.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
These are no longer used anywhere. Moreover, it's not clear that they
can be used for a correct implementation of pipeline barriers since a
correct implementation cannot ignore execution deps in non-shader
stages.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
In each of these cases, the spec mandates that apps pair a memory barrier
specified with access with a relevant exec barrrier specified by stages.
We therefore don't need to wfi based on access - the tests on stage are
sufficient.
Acked-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
We were under-synchronizing before. In particular, `stages` form
execution barriers even in the absence of a memory barrier in the
`access` flags.
The particular issue that prompted this was one where we weren't waiting
on a pipeline barrier in Baldur's Gate 3 with:
srcStageMask == VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT &&
srcAccessMask == VK_ACCESS_2_SHADER_READ_BIT &&
dstStageMask == (VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT |
VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT) &&
dstAccessMask == (VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT)
Based on the spec and discussion in
https://github.com/KhronosGroup/Vulkan-Docs/issues/131 the read bit in
srcAccessMask doesn't really matter here - what matters is that there's
an execution barrier on the fragment stage which needs to prevent the
fragment shader exection from overlapping with the later call's
fragment tests (which write to the depth attachment).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13909
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
When we want to WFI, we only need to do so on a single channel. The
others will implicitly get a WFI from the channel switch.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
This is presumably the same cache across compute and 3d, so we only need
to run one of these, not two.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
Current code allocates the maximum QMD data for all generations and
uploads everything, even on generations where a smaller QMD buffer
suffices. This is not only wasteful, but actually crashes Kepler GPUs
due to complications with the QMD queue.
Only upload the useful bytes of the QMD buffer.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14070
Fixes: 0e268dad00 ("nvk: Allow for larger QMDs")
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37815>
In Lua, modules (i.e. files with lua code) are loaded by using
the standard library require(), e.g.
```
local mylib = require("mylib")
mylib.do_something()
```
The require() will decide where to look by peeking at `package.path`
table. By default it doesn't include the scripts directory, so running
executor from the script directory vs. from the root of the repo would
yield different results (require works vs. require fail to find the
module). This patch includes the script directory to avoid this issue.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>