After all int64/double lowerings, there might still be 64b registers
left which ir3 currently doesn't handle. This only happens in a small
number of Piglit tests where those registers (or the variables they come
from) did not get DCE'd.
This patch handles 64b registers in ir3 by adding a NIR pass that does
the following:
- @decl_reg -> split in two 32b ones
- @store_reg -> unpack_64_2x32_split_x/y and two separate stores
- @load_reg -> two separate loads and pack_64_2x32_split
After this pass, the 64b vecs used for the original loads/stores are
still present and are also not handled yet by ir3. This patch removes
them by running nir_lower_alu_to_scalar and nir_copy_prop.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26175>
We only support the variablePointersStorageBuffer feature of variable pointers,
which basically ensures that pointers may only target one buffer. This means
that a particular pointer may change where it points within a given buffer but
it cannot change its value to point to some other buffer. This is a requirement
from us since we expect buffer indices on buffer loads and stores to be
constant, so we can't have a buffer load come through a pointer that may
be assigned to different buffers, since in that case the buffer index
would need to come from bcsel.
There is, however, a small complication: the spec still allows pointers to
be null, and NIR defines null pointers to use 0xffffffff for both the buffer
index and the offset, which will cause a problem in a scenario like this:
int *b = ...
if (cond) {
b = null;
discard;
}
ubo_load(b);
Here the buffer index for the ubo load may come from a bcsel choosing between
the null pointer (0xffffffff) and the valid address (let's say 0), so we don't
have a constant and we assert fail.
This change detects this scenario and upon finding it will rewrite the buffer
index on the null pointer branch of the bcsel to match that of the valid
branch so that later optimizations passes can remove the bcsel and we end up
with a constant index. This is fine because a null pointer dereference is
undefined behavior and it is not something we should see in valid applications.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
Otherwise we may emit a load from an invalid offset from
a lane that was discarded.
This fixes an simulator assert from triggering when
executing:
dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_null_pointer_load
That test emits a conditional kill and then a buffer load
which would have invalid offsets for the lines killed. Since
the buffer load is in uniform control flow we were incorrectly
emitting a full quad load, including disabled lanes which would
prompt the simulator to assert on invalid offsets being loaded
coming from the lanes that had been killed in the shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
ROI (region of interest) feature implementation
in va.
It does not support ROI priority, and supports
qp delta, and the maximum number of supported
region is defined as 32, the region sequence implies
the priority, the lower the sequence number, the higher
the region priority, when region overlapping happened,
the higher priority region overwrites the lower
priority one.
And specifically for AV1, the adjust step will be
rounded by 5 when rate control is used, for example,
if qp_delta (q index) is 6, it will use 5, if
qp_delta is 8, it will use 10.
For AVC/HEVC (RC/CQP) and AV1 CQP mode, the
qp_delta granularity is 1.
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26659>
This is for enabling ROI (region of interest)
feature in VAAPI interface. It will support
both CQP and rate control mode.
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26659>
Let the kmod backend know when we might end up exporting a BO. This
doesn't change anything for the Panfrost kmod backend, but will be
needed for Panthor.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
Check VkExportMemoryAllocateInfo to know if we might export the BO
object at some point.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
This flag reflects the ability to share a BO. This lets the kmod
backend optimize the case where the BO stays private to a specific
VM.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
We keep the existing implementation unchanged but use pan_kmod for
all interactions with the kernel driver.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
The current code assumes the logical device comes with a VM, so let's
explicitly create this default VM so we can map BOs with the kmod API.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
Back panfrost_device with pan_kmod_dev object and query all props using
the pan_kmod_dev_query_props().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
We are about to delegate some BO-related operations to the pan_kmod
layer, but before we can do that, we need to hide panfrost_bo
internals so we can redirect such accesses to pan_kmod.
Provide panfrost_bo_{size,handle}() accessors and start using them.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
We are about to delegate some device-related operations to the pan_kmod
layer, but before we can do that, we need to hide panfrost_device
internals so we can redirect such accesses to pan_kmod.
Provide a few panfrost_device_xxx() accessors and start using them.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
Just a few implementation details that are worth mentioning:
- Panfrost doesn't support explicit VM management. This means
panfrost_kmod_vm_create() must always be created with
PAN_KMOD_VM_FLAG_AUTO_VA and panfrost_kmod_vm_bind(op=map) must
always be passed PAN_KMOD_VM_MAP_AUTO_VA. The actual VA is assigned
at BO creation time, and returned through
drm_panfrost_create_bo.offset, or can be queried through
DRM_IOCTL_PANFROST_GET_BO_OFFSET for imported BOs.
- Evictability is hooked up to DRM_IOCTL_PANFROST_MADVISE.
- BO wait is natively supported through DRM_IOCTL_PANFROST_WAIT_BO.
The rest is just a straightforward translation between the kmod API and
the existing panfrost ioctls.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
We have generic BO management and device management layers that
directly call kernel driver-specific ioctls. The introduction of
Panthor (the new kernel driver supporting CSF hardware) forces us to
abstract some low-level operations. This could be done directly in
pan_{bo,device,props}.{c,h}, but having the abstraction clearly defined
and separated from the rest of the code makes for a cleaner
implementation.
This is also a good way to get a low-level KMD abstraction that
we can use without pulling all sort of gallium-related details in,
which will be important for some refactoring we plan to do in panvk.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
We set max subgroup size as 16 for 'UnrealEngine5.1', this improves a
customer benchmark by 50% on A750.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26385>
When we moved the bulk of glsl_type to C, these globals were
kept to avoid changes to compiler/glsl code in the MR. Now that
landed, change the code to use the actual bultins directly.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26658>
These should probably go on the instance but everything is tangled up
too badly right now. This at least moves them to some place where we
have them without a nouveau_ws_device. It's fine to do this because
debug flags are an environment variable and won't change across a run.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26615>
This breaks the dependency pass in two. The first pass builds a
dependency graph, including first-wait information for each barrier.
The second applies uses the newly constructed dependencies to place
barriers. This fixes at least two known bugs:
1. We were placing redundant write barriers. In the case where we did
a load, for example, we would add read barriers for the address and
write barriers for the result. In the fairly common case where the
result is used before someone tries to overwrite the address, we
don't actually need both barriers because a wait on the result
implies a wait on the sources.
2. There were a bunch of WaR cases which weren't being handled
correctly. In particular, when a variable-latency instruction read
a register and then a fixed fixed-latency instruction read it, the
fixed-latency read would replace the variable latency read. When we
then wrote that value with a fixed-latency instruction, we wouldn't
see the hazard. This commit fixes it by replacing the single last
use per reg with a Vec of uses in the case of reads.
This fixes all known 1.1 memory model fails.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26615>