RA was manually fiddling with regs to copy over the parallel copy code,
which has to be done in a different way, but if we switch this all over
at once it shouldn't be a problem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
Also change the indexing in ir3_delayslots, so it's finally sane! To do
this we also have to change foreach_ssa_src_n to index srcs instead of
regs, so that the indexing stays in sync.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
Instructions that operate on an array read the previous state of the
array, modify it, and write a new array, at least conceptually before
RA. Previously the same register specified the previous state and acted
as the new state, but this meant that it was both a source and
destination which meant that it was getting in the way of splitting up
sources and destinations. Break out the source into a separate register,
and use the new tied-src infrastructure to share code with a6xx atomics.
With this, there are basically no more special cases for arrays in RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
Previously this was hard-coded for a6xx atomic instructions. However
we'll need a way for array destinations to point to the source with the
previous value of the array when we split them up. This is conceptually
the same as tied source/destinations for a6xx atomics, except that array
writes sometimes won't have a previous value to point to. So move this
into the IR so that it can be more dynamic. As a bonus we can move the
knowledge of a6xx atomics out of RA, where it's out-of-place, and into
the a6xx-specific code that creates them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11469>
Initial implementation missed various fields that derive from the
primitive topology. This patch fixes 3DSTATE_RASTER/3DSTATE_SF,
3DSTATE_CLIP and 3DSTATE_WM (gen7.x) emission in the dynamic case.
Fixes: f6fa4a8000 ("anv: add support for dynamic primitive topology change")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4924
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11379>
Validation layers should warn you about this
(VUID-VkBindBufferMemoryInfo-size-01037) but this would be useful for
zink debugging.
Requested by Zmike.
v2: Also check memoryOffset (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11542>
Right now the accumulator-clearing move emitted by the generator for
Wa_14010017096 inherits the SWSB field from the previous instruction.
This can lead to redundant synchronization, or possibly more serious
issues if the previous instruction had a TGL_SBID_SET SWSB
synchronization mode. Take the SWSB synchronization information from
the IR.
Fixes: a27542c5dd ("intel/compiler: Clear accumulator register before EOT")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
This is unlikely to have had any negative side effect on the original
TGL, but will lead to issues on XeHP+ if the software scoreboard pass
isn't able to synchronize the accumulator writes.
Fixes: a27542c5dd ("intel/compiler: Clear accumulator register before EOT")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
In cases where an in-order instruction is overwriting a register
previously read by another in-order instruction, drop the dependency
iff the previous read is guaranteed to have occurred from the same
in-order pipeline. This should only have an effect on XeHP+ since
previous Xe platforms only had one in-order FPU pipeline.
The previous workaround we were using for this treated all ordered
read dependencies as write dependencies to avoid noise from our
simulation environment. Relative to our previous workaround this
improves performance of GFXBench5 gl_tess by ~7% on a DG2 system
among other single-digit percentual FPS improvements.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
The hardware fails to provide the expected data coherency guarantees
for accumulator registers when accessed from multiple FPU pipelines.
Fix this by tracking implicit accumulator accesses just like we do for
regular GRF registers, but instead of adding synchronization
annotations for any dependency we only do it for dependencies with a
pipeline mismatch, since the hardware should be able to guarantee
proper synchronization for matching pipelines.
Note that this workaround handles RaW and WaW dependencies in addition
to the WaR dependencies described in the hardware bug report even
though cross-pipeline RaW accumulator dependencies should be extremely
rare, since chances are the hardware will also hang if we ever hit
such a condition. This only affects XeHP+, since all FPU instructions
are executed as a single in-order pipeline on earlier Xe platforms.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
This change reduces the precision of the scoreboard data structure for
accumulator registers, because the rules determining the aliasing of
accumulator registers are non-trivial and poorly documented (e.g. acc0
overlaps the storage of acc1 when the former is accessed with an
integer type). We could implement those rules but it wouldn't have
any practical benefit since we currently only use acc0-1, and for the
most part we can rely on the hardware's accumulator dependency
tracking. Instead make our lives easier by representing it as a
single register.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
This fixes an issue where one context only creates buffers while another
context only destroys buffers. Only the creating context can release its
buffers and the destroying context only turns them into zombie buffers.
This fix makes the creating context release its zombie buffers.
It's not a plot from an apocalyptic movie.
Fixes: e014e3b6be "mesa: don't count buffer references for the context that created them"
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4840
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11514>
this is another prim type bitmask which will trigger automatic draw rewriting
to a direct draw any time a prim-restart draw occurs with a prim type that is
not supported by the driver for prim restart, even if that prim type is supported
for normal drawing
the default is set to all prim types to preserve existing functionality, and PrimitiveRestartForPatches
is now explicitly set to false because no driver supports it
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10973>
drivers can now export a bitmask of the primitive types they support,
and all others will be automatically be rewritten
the default value is set to all primitive types supported to preserve
existing behavior
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10973>
for drivers that set it, this now automatically handles restart index rewriting
by running draws through primconvert when necessary
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10973>
once a draw reaches primconvert, it should never be able to reach the driver
until all draw operations have been converted as necessary
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10973>
this is a bit more work, as the primitive restart needs to be rewritten into a multidraw,
then the multidraw converted to the new primitive type and serialized back into a
single draw
detection is handled using a new primconvert config member, which is set to the full
primtype mask by default for compatibility with existing drivers
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10973>