mathemtically, associativity is only defined for binary operations. I have no
idea what "associativity" would even mean for imad. I can kinda see the idea for
iadd3 but iadd3 should not be formed until after reassociating adds so the point
is moot. Unmark the
"associative" ternary operations, and assert that associativity implies binary.
nothing uses associativity yet, so this doesn't cause any functional change.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>
nothing currently uses the associative flag, but they will change soon. we need
to stop incorrectly marking fmul/fadd/etc as associative, because they're not,
but they almost are. distinguish these properties so we can correctly
handle floating point rules without any opcode-based special casing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>
this chain of fmul is deliberately chosen for floating point precision
reasons, it needs to be exact, or else we might try to reassociate it
and break subnormal handling.
avoids regressing dEQP-VK.glsl.builtin.precision.ldexp_subnormals.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>
As seen in the Vivante kernel driver context init, GPUs with the icache
feature have a new set of states to specify the shader ranges. While the
old state still seems to work, it limits the size of the shader that can
be executed to 64K instructions. The new range states holds up to 20 bits
according to the comment in the Vivante kernel driver, which allows up
to 1M instructions.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36114>
Cores with unified instruction memory get the start and end points of
the shaders via the shader range registers. Don't emit the unnecessary
START_PC and END_PC states on those cores.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36114>
When writing new shader instructions through the unified state area
we must tell the GPU which caches to flush by setting the appropriate
code steering bit. Failing to do this doesn't seem to have much of an
effect when only loading shaders through the state memory, but when
toggling between using icache (as in load shaders from memory) and
loading instructions from the state area, this fixes severe corruption
and GPU hangs due to old code being executed.
Programming the steering bits is only needed for GPUs with either
unified instruction or unified uniform states.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36114>
Bit 0 of the SH_CONTROL register does not control uniform cache
flushes so stop touching this bit when updating the uniforms.
While it is harmless to change the bit at this time in the emit
sequence, it's confusing and not needed.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36114>
Update from rnndb commit 19bc9377a80a ("rnndb: rename
UNIFORM_CACHE to CONTROL and document code cache flushing")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36114>
Replace uses of brw_builder::at() with various more descriptive
variants. Use block pointer from instruction when possible.
A couple of special cases remained and will be handled in separate patches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34681>
The imod instructions are lowered to 4 alu instructions each. We can do
better by packing the results with the values for kz.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
Instead of using fp64 (Which is broken in some cases) the new approach
only uses fp32 and implements tiebreaking for edge/vertex hits. Using
fp32 is also much faster, improving performance of q2rtx by around 40%.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
The GL driver expects special sync handling when a buffer is newly
exported, and also requires that bo->prime_fd be set so the batch code
can use it later. Add a function to do this for the KMS export case,
which otherwise would not need a PRIME fd.
agx_bo_export() then becomes a simple dup of bo->prime_fd (which is
probably marginally faster than redoing drmPrimeHandleToFD() anyway).
The thread safety story here is that as long as we do all this the first
time a BO is exported (in any way), there is no way for another thread
to have gotten ahold of the BO already, so no need for extra locking.
This does not affect hk, since it doesn't rely on bo->prime_fd for
anything. It also doesn't affect the timestamp BO and other special
cases.
Fixes: 067d820c9d ("asahi: Mark KMS exported resource BOs as shared")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13563
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36241>