Another case of not clearing the metadata correctly.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
There will be situations where we will want to use a local builder
rather than the one associated with NIR->backend translation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
v2: Use the upper bound of dual subslices as the ID is not remapped
with fused off parts and this is what we'll use for a bunch of
computation in RT.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.
The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling. Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.
We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.
One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains). That
said, it's still better than before.
For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).
shader-db results on Icelake:
total instructions in shared programs: 19605070 -> 19602284 (-0.01%)
instructions in affected programs: 65338 -> 62552 (-4.26%)
helped: 271 / HURT: 0
helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
95% mean confidence interval for instructions value: -10.71 -9.85
95% mean confidence interval for instructions %-change: -6.17% -5.43%
Instructions are helped.
total cycles in shared programs: 851854659 -> 851820320 (<.01%)
cycles in affected programs: 618749 -> 584410 (-5.55%)
helped: 271 / HURT: 0
helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108
helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06%
95% mean confidence interval for cycles value: -135.89 -117.54
95% mean confidence interval for cycles %-change: -6.72% -5.63%
Cycles are helped.
total sends in shared programs: 1025285 -> 1024355 (-0.09%)
sends in affected programs: 6454 -> 5524 (-14.41%)
helped: 271 / HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
95% mean confidence interval for sends value: -3.57 -3.29
95% mean confidence interval for sends %-change: -15.42% -14.54%
Sends are helped.
According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
This lets us vectorize gl_TessLevel{Inner,Outer} writes, using a pass
developed for RADV. Not all backends are prepared to handle this, so
we make it optional.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
VS, TCS, TES, and GS threads must end with a URB write message with the
EOT (end of thread) bit set. For VS and TES, we shadow output variables
with temporaries and perform all stores at the end of the shader, giving
us an existing message to do the EOT.
In tessellation control shaders, we don't defer output stores until the
end of the thread like we do for vertex or evaluation shaders. We just
process store_output and store_per_vertex_output intrinsics where they
occur, which may be in control flow. So we can't guarantee that there's
a URB write being at the end of the shader.
Traditionally, we've just emitted a separate URB write to finish TCS
threads, doing a writemasked write to an single patch header DWord.
On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is
a convenient spot to do so. But on other platforms, there's no such
field, and this write is purely wasteful.
Insetad of emitting a separate write, we can just look for an existing
URB write at the end of the program and tag that with EOT, if possible.
We already had code to do this for geometry shaders, so just lift it
into a helper function and reuse it.
No changes in shader-db.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
We're getting a number of UBSan error while running intel-clc in that
image. It seems that we're the first ones to run into a number of code
paths with intel-clc and it shows a number of undefined behavior
operations like signed extension stuff in NIR/IntelBackend, unaligned
pointer accesses in embedded list iterators, etc...
Preparing some patches in a different MR to fix this.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18788>
Debian (unlike Ubuntu) has a broken libclc package missing files we
would very much like to have in our image, so that intel_clc doesn't
fail. Namely :
/usr/lib/clc/spirv-mesa3d-.spv
/usr/lib/clc/spirv64-mesa3d-.spv
Dropping libclc from the distribution and build int the build & base
test image.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18788>
Debian stable and Fedora do not package the required version for
intel-clc.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18788>
This is needed by Anv to parse GRL (meta opencl kernels).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18788>
now that these aren't always lowered from indirects,
64bit function temp arrays may exist in shaders, and they need to use
all the same rewrite machinery
fixes#7310
SoroushIMG <soroush.kashani@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18728>
somehow 64bit lowering creates patterns like
vec1 64 ssa_1 = undefined
ssa_2 = unpack_64_2x32_split_x ssa_1
and then the 64bit value is never otherwise used. for this case, rewriting
the unpack to just be a 32bit undef allows the 64bit undef to be optimized
out, avoiding spec violations
fixes#6945
SoroushIMG <soroush.kashani@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18728>
some drivers (mostly desktop) are known to ignore this pipeline flag,
thus it can be set unconditionally in pipeline state to deduplicate
pipeline variants without affecting performance
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18787>
an implicit feedback loop occurs when an app happens to bind the same image
as both a framebuffer attachment and a sampler for the same draw
an explicit feedback loop occurs when an app uses fbfetch to read data back
from the framebuffer using input attachments
fbfetch is already handled, but implicit feedback loops require more work:
* detecting them happens on-the-fly
* pipeline variants are required
this handles implicit feedback loops by detecting them at draw time during
barrier updates and then flagging pipeline state change to trigger variant creation.
the bits are then unset when the framebuffer/sampler binds are removed
fixes#7309
fixes (tu):
KHR-GL46.texture_barrier.disjoint-texels
KHR-GL46.texture_barrier.overlapping-texels
KHR-GL46.texture_barrier.same-texel-rw-multipass
KHR-GL46.texture_barrier_ARB.disjoint-texels
KHR-GL46.texture_barrier_ARB.overlapping-texels
KHR-GL46.texture_barrier_ARB.same-texel-rw-multipass
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18787>
this was my (feeble) attempt at tracking validity for swapchain images,
but it doesn't actually work right, and zink_resource::valid is better
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18787>
The comment here is from the EGL_MESA_configless_context era, and maybe
that was true, but EGL_KHR_no_config_context has no such restriction.
The only restriction is that both surfaces be compatible with the
context, but a no-config context is defined to be compatible with any
surface.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18824>
It's a little difficult to see from the diff, but this is effectively
the same calling sequence as before, and more importantly it means the
backend only cleans up backend state rather than needing to call up to
the core.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18824>
... when we're only changing one of the two to zero. I could maybe
understand the old behavior here if we really did need both read/draw to
either be both from winsys or both user, but we don't. You can still
get to that half-and-half state even with the old code, you just need to
set whichever of read/draw to 0 that you want first, and then set the
other to the user fbo.
The first code to introduce a read fb to this path was dbfb375805,
which was trying to make sure we restored both values passed to
glXMakeCurrentRead when going back to fb 0. The commit message suggests
that "the API" doesn't allow split read/draw, but that seems to have
ignored the #ifdef EXT_framebuffer_object code immediately nearby which
did exactly that.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18824>
When multiview is used, the last pre-rasterization stage should export
the layer to the fragment shader. With GPL, the next stage is
implicitly FS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18699>