Ensures early algebraic passes aren't called again following late
algebraic passes, so that the latter's opts aren't undone (e.g.
unfusing ffmas).
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>
The rounding behaviour on [iu]2f32 ops needs to be explicitly set in
order to match the implicit behaviour described in the
KHR_shader_float_controls properties.
Fixes: e306abc6e6 ("pvr: implement KHR_shader_float_controls")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>
Switching from compute to 3D and vice versa leads to a long stall which
destroys compute performance. This switches to the compute MME on Ampere
onwards (which was where it was added) for compute dispatches which eliminates
stalling from sub-channel switching in these cases.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
In 03f785083f ("nvk: Reserve MME scratch area for communicating with
FALCON"), we said we reserved these but actually only reserved 0. Only
0 is actually used today but if we're going to claim to reserve
registers we should actually do it.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
These are no longer used anywhere. Moreover, it's not clear that they
can be used for a correct implementation of pipeline barriers since a
correct implementation cannot ignore execution deps in non-shader
stages.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
In each of these cases, the spec mandates that apps pair a memory barrier
specified with access with a relevant exec barrrier specified by stages.
We therefore don't need to wfi based on access - the tests on stage are
sufficient.
Acked-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
We were under-synchronizing before. In particular, `stages` form
execution barriers even in the absence of a memory barrier in the
`access` flags.
The particular issue that prompted this was one where we weren't waiting
on a pipeline barrier in Baldur's Gate 3 with:
srcStageMask == VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT &&
srcAccessMask == VK_ACCESS_2_SHADER_READ_BIT &&
dstStageMask == (VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT |
VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT) &&
dstAccessMask == (VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT)
Based on the spec and discussion in
https://github.com/KhronosGroup/Vulkan-Docs/issues/131 the read bit in
srcAccessMask doesn't really matter here - what matters is that there's
an execution barrier on the fragment stage which needs to prevent the
fragment shader exection from overlapping with the later call's
fragment tests (which write to the depth attachment).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13909
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
When we want to WFI, we only need to do so on a single channel. The
others will implicitly get a WFI from the channel switch.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
This is presumably the same cache across compute and 3d, so we only need
to run one of these, not two.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>
Current code allocates the maximum QMD data for all generations and
uploads everything, even on generations where a smaller QMD buffer
suffices. This is not only wasteful, but actually crashes Kepler GPUs
due to complications with the QMD queue.
Only upload the useful bytes of the QMD buffer.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14070
Fixes: 0e268dad00 ("nvk: Allow for larger QMDs")
Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37815>
In Lua, modules (i.e. files with lua code) are loaded by using
the standard library require(), e.g.
```
local mylib = require("mylib")
mylib.do_something()
```
The require() will decide where to look by peeking at `package.path`
table. By default it doesn't include the scripts directory, so running
executor from the script directory vs. from the root of the repo would
yield different results (require works vs. require fail to find the
module). This patch includes the script directory to avoid this issue.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>
This will avoid assertion failures about a size==0 in the upcoming change
to regmask bitset handling, when collect_info() usees them to track
references into the current alias table. We know that relative accesses
won't go to the alias table, but that code doesn't.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37777>
First this is only possible on RCS or CCS engines.
Second if on CCS, we need to use a compute shader, 3D won't work.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37818>
As noted, the flag we allocate with controls whether *anyone* can implicit
sync on the BO through amdgpu interfaces, not just whether our fd does.
This restores radv to the behavior before the regressing commit.
Fixes: 4dcf32c56e ("wsi/drm: Don't request implicit sync if we're doing implicit sync ourselves.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37772>
Prior to 4dcf32c56e, radv was getting a request for implicit sync, even
when we were doing the work to do implicit sync in the WSI. Once that was
turned off, we incidentally dropped flagging WSI's mem->buffer as
uncached, due to it being under the wrong condition.
Fixes: 4dcf32c56e ("wsi/drm: Don't request implicit sync if we're doing implicit sync ourselves.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37772>
cmod propagation needs more work. Since the result type is always UD,
BRW_CONDITION_G should be able to substitute for NZ. Either that or
users of the condition could be rewritten to use an inverted condition.
v2: Add a couple more unit tests. Suggested by Matt.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>