We just want to emit an instruction, but we don't need to do anything
further with it, so we don't need to store the resulting inst pointer
anywhere.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
load_uniform does not have io_semantics and component.
Fixes: a83fd26 ("nir/print: stop trying to match i/o vars using base/driver_location")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28962>
This way we don't have to constantly copy the full thing at kernel
creation time lowering CPU overhead significantly.
With the previous changes clCreateKernel is basically for free.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28872>
We do not support it at runtime anyway and assert on them to be unique
across devices at build time. This significantly reduces overhead of
clCreateKernel as this is something applications actually rely on being
fast.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28872>
fsign's result can be +0.0 or -0.0 for -0.0. We already calculate
the signed zero, it's even faster to replace the fmul(fsign(x), ...) with ior.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28938>
If the primary core is a GPU, but a separate NPU exists, collect
NPU specs in addition to GPU specs.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28921>
Add a separate pipe for the NPU device when the primary device is a GPU.
In case of compute-only contexts, prefer to use the separate NPU pipe.
This allows to create a compute-only context that uses the NPU pipe on
a screen that has a 3D GPU as primary device.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28921>
Allow to pass both gpu and npu to etna_screen_create() separately,
in preparetion for devices with both 3D GPU and NPU.
Iterate over all cores or until both GPU and NPU are found.
If no 3D GPU was found, screen->gpu will be set to the npu as well,
so nothing changes for NPU-only devices.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28921>
Calling etna_gpu_new() with a nonexisting core can happen when iterating
all cores. Bail immediately if querying the model failed, there is no
use in also failing to query the revision.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28921>
The -ENXIO return value isn't necessarily an error condition.
When iterating over cores, this signals that there are no more
cores to be found.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28921>
For early error returns, all pipeline handles have to be destroyed.
Otherwise the caller will treat those valid handles as successfully
created pipeline objects.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28944>
With explicit sync, only if it wasn't done earlier for FIFO.
Prevents potentially unbounded memory usage for (wl_buffer.release
events in) the queue, since we don't dispatch the queue anywhere else
with explicit sync.
v2:
* Use wl_display_dispatch_queue_pending instead of
wl_display_dispatch_queue_timeout. (Sebastian Wick)
* Call it from wsi_wl_swapchain_queue_present instead of
wsi_wl_swapchain_acquire_next_image_explicit. (Joshua Ashton)
Fixes: 5f7a5a27ef ("wsi: Implement linux-drm-syncobj-v1")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28874>
The constant-folding definition and comments say that it takes the high
16 bits of the first source and low 16 bits of the second source, but
actually it's the opposite. The algebraic optimization, which actually
happens and needs to be correct, was correct but the comment above it
was wrong.
Note that in the way we use it when lowering multiplications, the
ordering doesn't matter.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
In the future we will have many ALU instructions passing shared
registers to each other, and surrounding them each with moves to/from
shared registers will severely bloat the IR size coming out of NIR and
make more pointless work for copy propagation. Instead, do something
more like the ACO approach and allow values stored in the hash table to
be shared, and move the burden of emitting a mov to non-shared to
ir3_get_src(). We will then use ir3_get_src_shared() or
ir3_get_src_maybe_shared() as appropriate in cases where we can handle
shared registers or where we can handle both.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
Propagate shared-ness to the the source, and when creating an ALU
instruction with all scalar sources and the instruction is supported on
the scalar ALU, automatically make the destination scalar. This includes
MOV/COV.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
In addition to replacing existing no-longer-needed usage of the
readfirst macro, we will use this for other NIR ALU instructions that
need to materialize constants when they use the shared ALU.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
When we use the scalar ALU we will start inserting moves with different
API-level semantics from readInvocation() or readFirstInvocation(). We
need to distinguish between these moves and lowered readInvocation()
moves, to avoid unnecessarily keeping helper invocations alive when
inserting (eq).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
cat1-cat4 instructions executed on the shared ALU can use shared
registers in an unlimited capacity, as opposed to the vector ALU which
apparently treats shared registers and consts the same. However they
cannot use "normal" sources (which must be "uniformized" via a mov).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
On a650 and later, there is a "scalar ALU" capable of executing cat2
instructions, a subset of cat3 instructions (csel but *not* mad), and
cat4 instructions. There is also another copy of the scalar ALU embedded
in HLSQ, which is responsible for executing preambles with the "early
preamble" bit set. The two new features are closely intertwined, because
the scalar ALU makes it possible to make most preambles only use shared
registers, letting us optimistically use shared registers and only fall
back to normal preambles if we ran out of shared registers. But the
scalar ALU is also generally useful for moving calculations of uniform
values like loop indices to the scalar ALU to reduce normal register
pressure and increase parallelism, because like SFU/EFU and texture
units different waves can execute ALU and scalar ALU instructions in
parallel.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
I got this wrong before because I missed the need for (ss), once that
was fixed then a move from anything to a shared register is legal,
include non-shared registers, as long as all active channels have the
same value.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
Copies from normal to shared registers are only allowed architecturally
if all of the active threads have the same value for the normal
register, which means that they can normally be propagated into e.g. ALU
instructions or other copies. However, there are a few instruction types
where this is not (currently) allowed, namely the scan macro where the
source is tied to a shared destination and the collect/split macros
where the lowering doesn't currently allow differently-typed sources and
destinations (although we may want to allow that in the future), so we
need to clean up ir3_valid_flags() to catch that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
We now use INVALID_REG to mean that a source or destination does not
have a preassigned register. We ignore this for anything but inputs and
outputs for now, but don't make it look like we're preassigning the
array to r0.x. This also will allow us to assert that preassigned
registers are in the correct range.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
This is what the rest of ra validation uses, because it returns the
correct thing for arrays (i.e. the base of the array, instead of the
actual register accessed). num is sometimes not set, so it was causing
spurious assertion failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>
In the past we avoided emitting pure 16-bit subgroup macros because of
this bug, but because we're going to start emitting the special moves
they expand to directly, we also have to handle the bug directly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22075>