This matches what the tgsi path does and doesn't regress any tests. (For
comparison, unlimited join nesting does regress tests in deqp and piglit)
Fixes graphical artifacts from stack overflows in
https://www.shadertoy.com/view/Xds3zN
with nir on kepler
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15597>
This will be helpful in regression-testing the nir-to-tgsi transition, and
with the big runners at google we have plenty of capacity to do it.
I dropped the GL3.0-3.2 caselists because GL4.3 should be a superset of
them.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15158>
The TGSI backend avoids TGSI_OPCODE_FMA (and thus OP_FMA) pre-nvc0,
replacing it with TGSI_OPCODE_MAD in that case.
Noticed when looking at native-NIR stats and finding that load
optimization wasn't taking place on the unsupported opcode.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15543>
The host virglrenderer can only handle moves to integer outputs, all
ALU opt that create integer outputs are created with extra code to convert
to float for the temporaries, and this breaks the output write
handling.
Fixes:
spec@arb_sample_shading@builtin-gl-sample-mask *
spec@arb_sample_shading@builtin-gl-sample-mask-simple *
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15921>
If we made a copy deref, then we need to do dead-write elimination for the
pervious writes or we'll just emit the same copy deref again next time
around. And, at the end of the opt loop, we need to lower copy derefs
because later passes (locals_to_regs, notably) depend on it.
Fixes infinite opt loop on fs-function-inout-array with virgl on NTT.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15899>
rusticl (and clover) would like to get a graceful fail here so they can
fall back to a shadow copy instead of us asserting. We also start
rejecting arrayed surface because isl doesn't allow selecting a QPitch
yet. Even if it did, QPitch is horribly restrictive, even for linear
surfaces, that it likely wouldn't be that useful.
Fixes: e81f3edf76 ("iris: Allow userptr on 1D and 2D images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15903>
anv sets the default EDSC flag, do the same for iris too
Fixes: 5ae278da18 ("iris: use vtbl to avoid multiple symbols, fix state base address")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15905>
Suggested by Francisco Jerez.
Although including VF invalidation in the flush bits is strange, we
believe this is the only way to guarantee that stream output has
finished.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
FLUSH_HDC is sufficient to flush things out to L3, so we'd rather
use that where possible. It's also emulated via DATA_CACHE_FLUSH
on platforms where it isn't supported, so we can use it unconditionally.
We still use DATA_CACHE_FLUSH for invalidating the data cache, and to
flush the DC-tagged cachelines in L3 to be globally-observable.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Push constant loading is not coherent with L3 according to the document
that describes the hardware change for the vertex buffer L3 Bypass
Disable field.
If we've updated a push constant buffer with say, a blorp_buffer_copy,
we may need to flush both the render cache and the tile cache.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
We should be using the cache tracker for this. We can consider
this access IRIS_DOMAIN_OTHER_READ now that it's the catch-all
non-L3-coherent read-only access domain.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
When stream output is active, we need to let the cache tracker know
about any SO buffers, which we access via IRIS_DOMAIN_OTHER_WRITE.
In particular, we may have written to those buffers via another
mechanism, such as BLORP buffer copies. In that case, previous writes
happened via IRIS_DOMAIN_RENDER_WRITE, in which case we'd need to flush
both the render cache and the tile cache to make that data globally-
observable before we begin writing via streamout, which is incoherent
with the earlier mechanism.
Fixes misrendering in Ryujinx.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6085
Fixes: d8cb76211c ("iris: Fix MOCS for buffer copies")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Most clients are L3-coherent these days. However, there are some
notable exceptions, such as push constants, stream output, and command
streamer memory reads and writes.
With the advent of the tile cache, flushing the render or depth caches
alone are no longer sufficient for memory to become globally-observable.
For those, we need to flush the tile cache as well. However, we'd like
to avoid that for L3-coherent clients, as it shouldn't be necessary,
and is expensive.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The render, depth, sampler, and data (HDC) caches are all coherent
with L3. We consider OTHER_READ and OTHER_WRITE to be non-coherent,
as they're kitchen-sink domains which include non-L3-clients.
Starting with Tigerlake, the VF cache is coherent with L3 (because we
set the L3BypassDisable bit in the vertex/index buffer packets).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
On Tigerlake, we use the data cache for reading indirect UBOs instead
of the sampler. But we still use the constant cache for direct UBO
access, so unfortunately we may access it through two different domains.
To work around this, we add a new domain for pull constants (UBOs),
which will be either constant+texture or constant+data.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The bulk of IRIS_DOMAIN_OTHER_READ domain usage was the 3D sampler, but
there were also a few oddball cases like command streamer reads, blitter
access, and so on. The sampler is definitely L3 coherent, but some off
the more esoteric reads may not be, so I'd like to separate them, so
that OTHER_READ can become a non-L3-coherent kitchen-sink domain.
The sampler cases only need TEXTURE_CACHE_INVALIDATE, and can skip the
CONSTANT_CACHE_INVALIDATE we had on IRIS_DOMAIN_OTHER_READ.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
We were using IRIS_DOMAIN_OTHER_READ for read-only depth/stencil access
in an attempt to avoid unnecessary flushing; IRIS_DOMAIN_DEPTH_WRITE
could indicate read-write access.
However, IRIS_DOMAIN_OTHER_READ is clearly the wrong domain. Depth and
stencil data is read via the depth cache, while IRIS_DOMAIN_OTHER_READ
currently corresponds to the sampler cache and constant cache together
(although this will change in future patches).
It's unclear whether this hack was useful. For now, just drop it and
use the correct depth cache domain, even if it's marked as read-write.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
For texture fetches and buffer load the fix is not needed,
and the override creates faulty TGSI.
In addition remove all modifiers from the src in the additional mov
instruction.
Fixes: d1c7a7b131
virgl: Add an extra mov for int outputs from constant and immediate inputs
v2: Move workaround after the use of
virgl_tgsi_rewrite_src_for_input_temp (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15896>
now zink will init using a priority system if multiple devices are available
multiple devices will ONLY be available if:
* the user does not specify VK_ICD_FILENAMES as they should
* the user does not specify LIBGL_ALWAYS_SOFTWARE
* multiple drivers exist
I've prioritized the virtualized gpu here with the assumption that if
such a thing is detected, the environment is most likely virtualized
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15857>
now that it's easier to determine whether zink is being used (mostly),
this whole thing can be simplified
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15857>
silently failing on release builds is annoying
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15851>
It can happen that a single vector constant is referenced multiple times
by the same node, with different swizzles.
This needs to be taken into account by checking and updating the
swizzles for all the srcs of a target node when inserting the const
node to the same instruction.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15726>
When emitting the AR forces splitting an ALU group, and at the same time
a new CF instruction is started, then the last instrcution in the finished
CF block might not have the "last" bit set, which results in an invalid
shader that might hang, or crash SB.
So when a new CF is started, force the last bit in the last ALU instruction.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
With the NIR code, we have instructions groups that use
INTERP_LOAD_P0 that don't fill all slots. Just make sure
the backend scheduler doesn't fill in INTERP_LOAD_P0
instructions with a different LDS location.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
Since the value is not written, there is no need to allocate
a register for it, so don't take it into account.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
ALU_SRC_PARAM_BASE is an inline constant that defines the
address for pulling data from LDS memory for interpolation
and not a value from the kcache, so there is no need to
take these values into account when allocating kcache
load slots.
v2: Fix the constant range check to not exclude the translated
ranges for kcache banks 2 and 3.
v3: limit range check to only include kcache values and and
rename relevant function (Emma).
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
virglrenderer doesn't properly emit the conversion code when the source
is a integer value and the output is also integer.
Fixes on NTT:
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.inverse_per_*
v2: fix typo (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
The host driver will optimize unused variables away, and checking thoroughly whether we
may need an extra temp is just uselessly costly.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
NIR doesn't propagate precise through moves, and with NTT the
last output is usually preceded by a move, so that we no longer
see that the evaluation of some value is supposed to be exact,
and, hence we can't decorate the outputs accordingly.
Fixes with NTT:
dEQP-GLES31.functional.tessellation.common_edge.
triangles_equal_spacing_precise
triangles_fractional_odd_spacing_precise
triangles_fractional_even_spacing_precise
quads_equal_spacing_precise
quads_fractional_odd_spacing_precise
quads_fractional_even_spacing_precise
v2: Don't clear the precise flag when we hit a mov, because we may
hit a if/else construct like below and we don't track branches
IF X
TEMP[0] = OP_PRECICE ...
ELSE
TEMP[0] = MOV CONST[]
ENDIF
Thanks Emma for pointing out the problem.
v2: allocate precise handling flags to transform_prolog (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
In addition really fill all slots, because otherwise the alu-group merger
might move a read from the indirectly written register into the 't' slot.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15848>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>