We requires this to be aligned to a 8 byte granuality.
This is something that came up with mesh shader enablement so let's
avoid this footgun with some assertion.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
Test Kepler as it was commented out but everything is
running fine.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
Not sure why it was missing but it should be part of it.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
Needed for some FSR macro changes I want to test.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 7d6cc15ab8 ("nvk/mme: Add a unit test framework for driver macros")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
This comes up in lowered load_ubo sequences (observed in OpenCL test
test_api min_max_parameter_size). Hopefully the pack gets coalesced,
it's like nir_op_vec2 on most backends, so it should usually be ok to
sink even though the register pressure heuristic will reject it.
Allowing it to sink allows the UBO load to sink.
Intel's backend scheduler can optimize the relevant sequences locally
but there should still be a win here for global load sinking.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
Those values trace back to 2015, pre Vulkan 1.0 release. I have no
idea why it was set to this, except maybe the HALIGN_128 of
RENDER_SURFACE_STATE.
Anyway, discussing this with Nanley, we don't think 128bytes is more
optimal than 64bytes. Nanley suggested the lowest value could be
16bytes for the fixed functions inside the GPU (sampler, dataport),
but a cacheline probably makes more sense for the memory interface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40363>
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
-201983 +/- 1064.87
-9.31575% +/- 0.0468505%
(Student's t, pooled s = 1257.67)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
-4484.55 +/- 1288.68
-0.205419% +/- 0.0589514%
(Student's t, pooled s = 1521.99)
This should also use less memory.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
The special case of CS stateobjs can be shared across contexts/threads.
In all other cases, they are private to a single ctx, and it is safe to
make single-threaded assumptions when it comes to fast-paths.
See also commit b74a07a422 ("freedreno/a6xx: Avoid touching long lived
stateobj refcnt").
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40352>
For 3d draws, we have a per-ctx cache, which ensures program stateobjs
are not shared between contexts/threads. We don't have this for compute
shaders.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40352>
Since shader CSOs can be shared across contexts, we need the
corresponding stateobj to be shareable across contexts. Otherwise
different ctxs could be racing with each other to build the stateobj.
Prep for next patch.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40352>
Ignore batches from other contexts (we provide no-cross-context
guarantees) and already flushed zombie batches that haven't yet
been removed from the batch cache table.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40352>
Otherwise there is a race condition between the variant creation and
upload (when a shader is shared between contexts) where the variant
has been created, but not yet uploaded.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40352>
st_texture_set_sampler_view() currently allows only one samplerview for
a given texobj per context. in a scenario where the same texobj is
bound multiple times with different samplerviews (e.g., SRGB) for the
same draw like
samplerviews[] = {view0, view1}
then st_texture_set_sampler_view() will release view0 while creating view1
before either view is actually set to the driver, and then the driver will explode
this is gross, but the best solution which avoids infinite memory ballooning
from bufferview offsets is to pass through the array of views during creation
to ensure that the cache doesn't try to prune a view it just created
caught by Left 4 Dead 2
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15045
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40353>
Currently the fixed function vertex shader is built as io_lowered
shaders; however the gl_nir_add_point_size() function currently expects
the original shader to be not io_lowered, and this function is called to
lower the fixed function vertex shader.
Add support for adding point size store_output intrinsics for io_lowered
shaders.
This fixes fixed function rendering on Zink with a Vulkan driver w/o
VK_KHR_maintence5.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40373>
Replace loop with macros
Rewrite channel op to multi channel select to avoid extra swizzle
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Signed-off-by: Radu Costas <radu.costas@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40320>
The correct dependence is cs_flush_caches.cs_defer.signal to
signal cs_sync32_set.cs_defer.wait in occulusion query path.
Fixes: 443ddac ("panvk/csf: merge v10 and v11 paths in
issue_fragment_jobs")
Fixed: many random fail cases in VK-GL-CTS 1.4.4.2, eg.
dEQP-VK.query_pool.occlusion_query.get_results_conservative
_size_64_wait_query_without_availability_draw_points_clear_color
Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40368>
This means we don't run into undefined behavior when testing nan/inf inputs.
Also make sure that patterns using is_only_used_as_float are signed zero correct.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
The commit created a problem on RHEL 9 where python 3.9 is the default,
which doesn't support the match keyword used by the commit.
Python 3.12 can be installed as another option, but then another problem
shows up that RHEL 9 lacks recent enough python3-mako that works with 3.12
(or at least that's my understanding of the issue), so we don't know yet
whether Mesa is even buildable on that.
Fixes: f2bb6103 - "vulkan/cmd_queue: Rework copy codegen"
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40305>
Now that the SSO ABI is never used we can remove it with all of the
plumbing code required to find the fixed varyings bits.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
Now that everything is using the new layout to emit descriptors we can
switch to the compact ABI. Even without mediump lowering this should
still offer increased performance for separate shaders, it also unifies
OpenGL and Vulkan code paths.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
Midforst no longer uses auto32, we can hopefully remove the quirk and
never think about it again.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
Using explicit types makes the code more easier to reason about, there
is only one edge-case where we still need varying stores but it should
be removed soon.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
After switching to the new layout we do not need most of the old
varyings info code, remove it.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
Every compiler is now using varyings_layout, we can remove the old
nir_collect_varyings and live happily ever after.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
pan_cmdstream handles code common to both JM and CSF,
while valhall v9 didn't use CSF yet, they already used the
new version of the descriptor.
This lead to conditional compilation of various function with
similar names.
This commit renames some version-specific descriptor code to have
a versioned name, either `bi` (bifrost), `mid` (midfrost) or `val`
(valhall), to help with code comprehension.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>